Repository Platforms (Software Tools)

The most commonly used data repository software platforms are DSpace, ePrints and Fedora. A brief introduction on each platform is shown in the following topics.

A special note is placed on Greenstone, which despite its lack of full workflow models, offers an interesting solution for data submission and archival in a ready-to-install standalone package.

DSpace

DSpace is - arguably - the most widely used repository solution for the preservation of digital assets, which is currently used primarily in the preservation and sharing of scientific publications. It is an open-source software platform, with an active community behind it.

DSpace enables organizations tomanual):

  • Capture and describe digital material using a submission workflow module, or a variety of programmatic ingest options
  • Distribute an organisation's digital assets over the web through a search and retrieval system
  • Preserve digital assets over the long term

DSpace provides support for plugins and different types of metadata. These metadata can be Descriptive Metadata, Administrative Metadata or Structural Metadata.

Descriptive Metadata is a flat metadata schema for describing item, e.g. Dublin Core. DSpace supports a set of qualifiers by default, but allows their customization.

Administrative metadata includes preservation metadata, provenance and authorization policy data. This is held in DSpace's relational DBMS, but some metadata is also mirrored in the Dublin Core records that DSpace exports - namely provenance metadata, bitstream size and MIME type.

Structural Metadata is information on how to present an item to an user and the parts which constitute an item. For example, if a thesis has several pages and each page is an image, the metadata could state that the thesis is made up of the several pages and that their ordering.

DSpace supports the process of data depositing and cataloguing through the existence of workflow models which includes up to three metadata validation stepsmanual):

dspace workflow

Finally, DSpace offers a query interface based on the stored metadata. However, it does not offer dataset domain-specific query capabilities, instead relying on the global parametrization of the metadata qualifiers.

DSpace will be used in this project as a basis for the creation of the proposed scientific data repository.

ePrints

ePrints is, much like DSpace, an open a repository platform designed for the preservation of digital publications. It supports submission workflows, composite querying and RSS feeds, both of the whole repository and specific query results.

The abstract window contains previews of included documents and allows for the exporting of references in the most popular formats (BibTeX, Dublin Core, etc).

A brief description of the ePrints repository platform is available at the EPrints website.

Fedora

Fedora is another repository platform provided by Duraspace.org (like DSpace) designed with flexibility in mind. According to the Fedora Tutorial 1fedora_features), Fedora includes several features which make it suited for multi-disciplinary repositories. A summary of these traits is shown in this section.

Fedora offers an abstract base data model, treating data, behavior definitions and behavior mechanisms all in the same way. The types of data are also treated the same way. The model is also flexible, since Fedora implementers can design content models to represent their data and the presentation requirements of their specific use cases.

The platform makes aggregation of digital content easier, since every object holds pointers to their parent elements. This way, figuring out the parts which make up the a given data asset is more straightforward. It also becomes possible to access content stored on remote web accessible servers. Extensibility was also taken into consideration, since Fedora's behavior interfaces are directly associated with the data within a Fedora object. As more services are added or others removed or changed, the data objects are automatically updated.

Distributed Repositories

The Fedora Architecture was designed to allow for the implementation of distributed repositories, introducing the concept of federation. This means that the systems support seamless access to multiple data sources, has better scalability and there is the possibility of balancing client load across multiple repository instances.

Preservation & Archiving

Preservation is ensured through the usage of four main concepts / technologies:

  • XML - Fedora's objects are stored in XML, with an associated schema for validation, ingestion and exporting.
  • Content Versioning - Ensured by the Fedora platform. When an object is changed, the audit trail is updated, saving records including the author of the changes, allowing users to retrieve older versions of a data object, searching by date/time.
  • Relationships - Fedora has its own set of metadata (built on RDF), allowing for the representation of relationships between objects in the repository.
  • Indexing - Using the specified relationships and the RDF querying languages such as SPARQL, Fedora allows for the indexing of all the data assets in the repository in a large query-able graph.
  • Event History - Every object in the Fedora repository contains an audit trail, which tracks all the changed made to an object.

Content Repurposing

Fedora makes it easier to reuse data for different purposes. The annexation of different stylesheets to the data and metadata of a Fedora Object allows multiple views of the object's data and metadata. It can be used to filter the amount of displayed information, depending on the user which is accessing a data asset. As user's demands change, one can adjust these stylesheets to provide new functionality, without having to change the data's structure.

Web Service Exposure - SOA

Fedora offers a set of web services which allows it to connect to many different types of clients and data consumers, as well as other web services. The content is provided in XML.

Greenstone

Greenstone deserves a special mention in this section, despite being a solution geared towards data archiving. It is different from the other platforms presented here because it is designed as a more simple and lightweight solution, without the complete workflow models required by full-fledged digital repositories. However, its flexibility allows it not no rely on a specific metadata standard (unlike DSpace with Dublin Core) and has been successfully customised to include deposit workflow procedures ext_gstone).

LOCKSS

LOCKSS (Lots of Copies Keep Stuff Safe), based at Stanford University Libraries, is an international community initiative that provides libraries with digital preservation tools and support so that they can easily and inexpensively collect and preserve their own copies of authorized e-content.

LOCKSS places its emphasis on creating many copies of a digital asset, spread across different machines and geographical locations, relying on a highly distributed system to ensure the safety of the preserved data.

A library uses LOCKSS software to turn a mid-range PC into a digital preservation appliance called a LOCKSS Box that performs the following four functions:

  • It collects content from the target web sites using a web crawler similar to those used by search engines.
  • It continually compares the content it has collected with the same content collected by other LOCKSS Boxes, and repairs any differences.
  • It acts as a web proxy or cache, providing browsers in the library's community with access to the publisher's content or the preserved content as appropriate. It can also serve content by Metadata (Open URLs) via resolvers.
  • It provides a web-based administrative interface that allows the library staff to target new journals for preservation, monitor the state of the journals being preserved, and control access to the preserved journals.

More information can be found in the project's About Us page.


manual), manual) Jeffrey Trimble, December 2010 (Consulted on March 2011). DSpace System Documentation.

fedora_features) Fedora Commons, July 23, 2008 (Consulted on March 2011). Fedora Tutorial #1 Introduction to Fedora.

ext_gstone) David Bainbridge, Wendy Osborn, Ian H. Witten and David M. Nichols, 2006 (Consulted on March 2011). Extending Greenstone for Institutional Repositories. Springer Berlin.
Navigation
Print/export
Toolbox
Languages
Translations of this page: