See Features for background on the evaluation.
DSpace is an open source digital repository
developed jointly by MIT Libraries and Hewlett-Packard Labs.
DSpace is being used as an institutional repository and a learning
object repository and for managing electronic theses and publishing.
This evaluation is based on DSpace 1.3.2 and focuses not on the DSpace application, but on the repository API which DSpace uses internally.
A list of sites running DSpace can be found at
DSpace is written in Java and provides a Java API, a
web application which runs in Tomcat, and command line tools.
The storage layer uses a RDBMS and a file system. Metadata is stored in
the database and data is stored in the file system. See
http://www.dspace.org/technology/system-docs/architecture.html for a diagram.
Using the Dspace Java API requires a process to be run as a user with
appropriate privileges on the host system. There is no remote access.
A DSPace site is divided into communities.
A community might be a department at a university.
Communities contain collections and can have subcollections.
Collections group related content.
A collection may be in more than one community.
Collection contains items. An item may be in more than one collection.
Items manage named bundles. A bundle is made up of one or
more bitstreams. A bitstream is a piece of content such as an image.
DSpace maintains a registry of bitstream formats.
Each community, collection, and item has a persistent identifier called a
handle. Bitstreams do not have handles.
DSpace uses the CNRI Handle System.
Toplevel communities can be created as needed. Subcommunities and
collections are created using a reference to the parent community.
Bundles are created from item references, and bitstreams
from bundle references.
Each collection has a workflow. Created items are submitted to a
collection. A workflow might directly install an item in the
collection or make installation conditional on review.
Bundles can be added to items and bitstreams can be added to bundles.
Communities, collections, and items can be looked up directly with a handle.
Bundles and bitstreams are accessed through an item.
Items can be withdrawn or expunged. Withdrawn objects are marked withdrawn.
Expunged items are deleted entirely.
Bundles can be removed from items and bitstreams from bundles.
DSpace knows about object parent/child relationships. Other
relationships that a user might create by storing them in an item do
not have referential integrity enforced.
Each item has dublin core elements which can be added and removed.
Bitstream and bundle metadata can be modified after the bundle or bitstream
is retrieved from an item.
An aggregation is a DSpace collection.
Collections are created as children of existing communities.
Communities can have subcommunities.
An item may be present in more than one collection.
Collections can be deleted. Deleting a collection removes all items that
would otherwise be orphaned.
Change aggregation membership
Items can be added and removed from collections. Orphaned items are
Find aggregation members
The items of a collection can be iterated over by retrieving an iterator from
the collection object.
The browsing support lets items in a collection or community matching certain criteria, such as title or author, be iterated over.
There is a command line tool, dspace-ingest, which uses the Java api.
The ingest tool takes as input a specially formatted file hierarchy
representing a set of items and adds those items to a collection.
The ingest tool saves a mapping of directories to handles.
The mapping can be used to resume an interrupted ingest.
There is a command line tool, dspace-export, which uses the Java api.
The export tool writes a collection to the file system in a METS based format or the dspace-ingest format.
Outside of the Java API, DSpace can expose collections for harvesting through OAI-PMH.
Direct use of the Java api requires the process run as a user with appropriate
database and file system privileges on the machine hosting DSpace.
DSpace authorization is based on resource policies.
A resource policy gives a group of users the right to perform an action on
a DSpace object.
Every user has an email address and name. Users can be grouped. Users
and groups can be managed through the Java api.
There is no support for locking.
Virtual object representation
There is no support for virtual representations.
Transactions are supported. Changes to a group of objects are not commited
to storage until Context.complete is called. Calling Context.abort will
throw out the changes.
The API does not directly support versioning. See http://simile.mit.edu/dspace-mit-docs/versioning.pdf for a description of how an application could use the API to support versioning.
DSpace has an API for doing keyword searching on the dublin core metadata of
objects and possibly the text of documents stored in objects.
DSpace uses Lucene for searching.
Lucene indexes the dublin core of
DSpace objects. The set of fields indexed can be configured.
As objects are added and removed, the indexes are updated.
The filter-media command line tool can be configured to extract text from
documents and index the text with lucene. The tool needs to be run regularly
to keep the full text index up to date.