Differences

This shows you the differences between two versions of the page.

executive_summary [2011/10/18 18:33]
mcr [Stages of the project]
executive_summary [2011/10/18 18:57] (current)
mcr [Conclusions]
Line 24: Line 24:
 The second stage of the project consisted in selecting and documenting the set of functionalities to be implemented in the third (development) stage. Since U.Porto's repositories are running on the DSpace repository platform, we have decided that the data repository should use the same system. Since DSpace is an open-source project, it can be extended. The interviews conducted in the first phase of the project provided a [[deliverables:use_case_report|use case report]] and the [[deliverables:class_diagram|data model]] for the DSpace data curation extension.  The second stage of the project consisted in selecting and documenting the set of functionalities to be implemented in the third (development) stage. Since U.Porto's repositories are running on the DSpace repository platform, we have decided that the data repository should use the same system. Since DSpace is an open-source project, it can be extended. The interviews conducted in the first phase of the project provided a [[deliverables:use_case_report|use case report]] and the [[deliverables:class_diagram|data model]] for the DSpace data curation extension. 
  
-==== Scope ====+The last stage of the project consisted of the deposit and curation of one of the datasets provided by the researchers, which was the testing scenario for the developed extension.
  
-The gathered datasets are undoubtedly different from each other, which poses challenges in the way they can be indexed, stored and searched. For this relatively short project, we have decided to build the extension around those datasets which could be more easily interpreted, both due to the current availability of the original authors and also due to their simpler structure (namely data tables).+===== Scope =====
  
-After the requirements analysis phasethe system was implemented and [[http://sciencedata.up.pt|made available at the Rectorate]]. It is based on DSpace 1.7.2 and was extended to include some data curation functionalityIt offers the tools that potential curators need to follow the specified [[support_guide:users_manual:the_base_curation_workflow|curation workflow]]+The gathered datasets are undoubtedly different from each other, which constitutes a challenge for the way they can be describedstored and searchedFor this short-term project, we have decided to build the extension around those datasets which could be more easily interpreted, both due to the ease of interaction with their creators and to their simpler structure (namely data tables).
  
-The extension allows collection administrators to index files submitted via the DSpace self-deposit workflow. This is done in by manually building an [[deliverables:use_case_report:pacotes:pacote_curadoria_e_parametrizacao:indexar_ficheiro|Excel spreadsheet]] for each file, in a specific format, and uploading it to the repositoryThese spreadsheets contain the series of logical data tables stored in the original file along with relevant metadata for each of these tablesThe repository will then translate the uploaded sheet, storing the data into an XML-based format to ensure the preservation of the dataAfter the file is curatedit becomes possible to access an area where the file's data can be exploredThis explorer area includes [[support_guide:users_manual:the_data_explorer_view#filtering_data|data filters]] which can be used to find the most relevant parts of the data. The data explorer view also allows users to download sections of the data, including the data shown after they apply any set of filters. +After the requirements analysis phase, the system was implemented and [[http://sciencedata.up.pt|made available at the University]]. It is based on DSpace 1.7.2which was extended to include some data curation functionalityIt offers the tools that data curators need to follow the specified [[support_guide:users_manual:the_base_curation_workflow|curation workflow]]
-Another implemented feature is a [[support_guide:users_manual:searching_for_datasets|search panel]] which can be used to find tables by the columns that they must include or by the metadata values associated to them.+
  
-To allow for the parametrisation of the metadata values and columns which can be included in these spreadsheets, [[support_guide:users_manual:administration_and_parametrization|two administration panels]] were added to the original DSpace Administration area.+The extension allows collection administrators to index files submitted via the DSpace self-deposit workflow. This is done in by manually building an [[deliverables:use_case_report:pacotes:pacote_curadoria_e_parametrizacao:indexar_ficheiro|Excel spreadsheet]] for each file, in a prescribed format, and uploading it to the repository. These spreadsheets contain the series of logical data tables stored in the original file along with relevant metadata for each of these tables. The repository will then translate the uploaded sheet, storing the data into an XML format to ease the preservation process. After the file is curated, it is possible to explore its data. The explorer includes [[support_guide:users_manual:the_data_explorer_view#filtering_data|data filters]] which can be used to focus on parts of the data. The data explorer view also allows users to download sections of the dataincluding the data shown after they apply any set of filters. 
 +Another implemented feature is a [[support_guide:users_manual:searching_for_datasets|search panel]] which can be used to find tables by column or by the metadata values associated to them.
  
-The last stage of the project included the deposit and curation of one of the datasets gathered during the interviews with the researcherswhich provided the testing scenario for the developed extension.+To allow for the parameterization of the metadata values and columns which can be included in these spreadsheets[[support_guide:users_manual:administration_and_parametrization|two administration panels]] were added to the original DSpace Administration area.
  
-Several [[support_guide|support guides]] were also written throughout the project, from which we highlight the [[support_guide:users_manual|user's manual]] and several installation guides and tips.+Several [[support_guide|support guides]] were also written throughout the project, from which we highlight the [[support_guide:users_manual|user's manual]]several installation guides and a collection of tips.
  
 ===== Conclusions ===== ===== Conclusions =====
  
-From this experiment we were able to determine that there are real needs for data curation practices at U.Porto. Researchers are open to using a repository platform, but only if it offers a real advantage over simply storing their data in their own storage mediaBetter ways to access and retrieve data are the best way to encourage proper care for data, since they increase the visibility of the researcher's work by allowing others to take advantage of the original research effort.+Th project has shown that data curation practices are needed at U.Porto and that researchers are willing to use a repository platform for their data. The project has proposed a data curation workflow and developed a DSpace extension that fits into the workflow and provides researchers with data browsing capabilities. Future work includes testing the platform with the researchers, and expects to engage them more systematically into managing their research data by increasing the visibility of their work and allowing others to build on their research effort.
  
Navigation
Print/export
Toolbox