This is an old revision of the document!


About the UPData Project

The UPData project ran from February to October 2011 and its team included researchers from the Faculty of Engineering (FEUP) and from the Rectorate of the University of Porto:

  • Maria Eugénia Matos Fernandes (Rectorate)
  • João Rocha e Silva (Rectorate/FEUP)
  • Cristina Ribeiro (FEUP)
  • João Correia Lopes (FEUP)

The project studied the application of data curation practices to the research data at U.Porto.

Motivation

The university has already implemented some preservation practices concerning scientific publications; currently there are two operational repositories running at the Rectorate - U.Porto Open Repository and U.Porto Thematic Repository. Publications are usually the most visible part of the research conducted at the University. However, they are the culmination of a research workflow, which frequently involves considerable data gathering and analysis effort.

Data curation of scientific datasets assumes therefore a relevant role in providing for the preservation and reuse or research data.

We assumed that the experience of the researchers had to be taken into account when dealing with data creation, preservation and reuse circumstances at U.Porto - by analysing their concerns we expected to identify opportunities for improving the data management lifecycle.

Stages of the project

The project has been developed in four stages. The first consisted in a brief preliminary study of similar projects such as the Data Asset Framework and the Edinburgh DataShare. From this study, an interview guide was drafted to be used during the interviews conducted with researchers from several of the university's research institutions. The interviews served two main purposes: determining the current data safeguarding, organisation and sharing practises followed by the researchers and also listing the most useful features that they expect from a data repository. The interviews yielded a series of reports and datasets (whenever available) and allowed the team to have a general view of the data management reality at U.Porto, a pre-requisite for the next phase.

The second stage of the project consisted in selecting and documenting the set of functionalities to be implemented in the third (development) stage. Since U.Porto's repositories are running on the DSpace repository platform, we have decided that the data repository should use the same system. Since DSpace is an open-source project, it can be extended. The interviews conducted in the first phase of the project provided a use case report and the data model for the DSpace data curation extension.

Scope

The gathered datasets are undoubtedly different from each other, which poses challenges in the way they can be indexed, stored and searched. For this relatively short project, we have decided to build the extension around those datasets which could be more easily interpreted, both due to the current availability of the original authors and also due to their simpler structure (namely data tables).

After the requirements analysis phase, the system was implemented and made available at the Rectorate. It is based on DSpace 1.7.2 and was extended to include some data curation functionality. It offers the tools that potential curators need to follow the specified curation workflow

The extension allows collection administrators to index files submitted via the DSpace self-deposit workflow. This is done in by manually building an Excel spreadsheet for each file, in a specific format, and uploading it to the repository. These spreadsheets contain the series of logical data tables stored in the original file along with relevant metadata for each of these tables. The repository will then translate the uploaded sheet, storing the data into an XML-based format to ensure the preservation of the data. After the file is curated, it becomes possible to access an area where the file's data can be explored. This explorer area includes data filters which can be used to find the most relevant parts of the data. The data explorer view also allows users to download sections of the data, including the data shown after they apply any set of filters. Another implemented feature is a search panel which can be used to find tables by the columns that they must include or by the metadata values associated to them.

To allow for the parametrisation of the metadata values and columns which can be included in these spreadsheets, two administration panels were added to the original DSpace Administration area.

The last stage of the project included the deposit and curation of one of the datasets gathered during the interviews with the researchers, which provided the testing scenario for the developed extension.

Several support guides were also written throughout the project, from which we highlight the user's manual and several installation guides and tips.

Conclusions

From this experiment we were able to determine that there are real needs for data curation practices at U.Porto. Researchers are open to using a repository platform, but only if it offers a real advantage over simply storing their data in their own storage media. Better ways to access and retrieve data are the best way to encourage proper care for data, since they increase the visibility of the researcher's work by allowing others to take advantage of the original research effort.

Navigation
Print/export
Toolbox
Languages
Translations of this page: