Child pages
  • DSpaceFeatures
Skip to end of metadata
Go to start of metadata

See Features for background on the evaluation.

Introduction

DSpace is an open source digital repository
developed jointly by MIT Libraries and Hewlett-Packard Labs.

DSpace is being used as an institutional repository and a learning
object repository and for managing electronic theses and publishing.

This evaluation is based on DSpace 1.3.2 and focuses not on the DSpace application, but on the repository API which DSpace uses internally.

A list of sites running DSpace can be found at
http://wiki.dspace.org/DspaceInstances.

Technology

DSpace is written in Java and provides a Java API, a
web application which runs in Tomcat, and command line tools.
The storage layer uses a RDBMS and a file system. Metadata is stored in
the database and data is stored in the file system. See
http://www.dspace.org/technology/system-docs/architecture.html for a diagram.

Using the Dspace Java API requires a process to be run as a user with
appropriate privileges on the host system. There is no remote access.

OCLC has developed a SRW interface to DSpace called SRW/U.

Data model

A DSPace site is divided into communities.
A community might be a department at a university.
Communities contain collections and can have subcollections.
Collections group related content.
A collection may be in more than one community.

Collection contains items. An item may be in more than one collection.
Items manage named bundles. A bundle is made up of one or
more bitstreams. A bitstream is a piece of content such as an image.
DSpace maintains a registry of bitstream formats.

Features

Storage

Each community, collection, and item has a persistent identifier called a
handle. Bitstreams do not have handles.
DSpace uses the CNRI Handle System.

Add data

Toplevel communities can be created as needed. Subcommunities and
collections are created using a reference to the parent community.
Bundles are created from item references, and bitstreams
from bundle references.

Each collection has a workflow. Created items are submitted to a
collection. A workflow might directly install an item in the
collection or make installation conditional on review.

Bundles can be added to items and bitstreams can be added to bundles.

Access data

Communities, collections, and items can be looked up directly with a handle.
Bundles and bitstreams are accessed through an item.

Remove data

Items can be withdrawn or expunged. Withdrawn objects are marked withdrawn.
Expunged items are deleted entirely.

Bundles can be removed from items and bitstreams from bundles.

DSpace knows about object parent/child relationships. Other
relationships that a user might create by storing them in an item do
not have referential integrity enforced.

Manage metadata

Each item has dublin core elements which can be added and removed.

Bitstream and bundle metadata can be modified after the bundle or bitstream
is retrieved from an item.

Aggregation

An aggregation is a DSpace collection.

Create aggregation

Collections are created as children of existing communities.
Communities can have subcommunities.
An item may be present in more than one collection.

Remove aggregation

Collections can be deleted. Deleting a collection removes all items that
would otherwise be orphaned.

Change aggregation membership

Items can be added and removed from collections. Orphaned items are
automatically deleted.

Find aggregation members

The items of a collection can be iterated over by retrieving an iterator from
the collection object.
The browsing support lets items in a collection or community matching certain criteria, such as title or author, be iterated over.

Management

Bulk ingest

There is a command line tool, dspace-ingest, which uses the Java api.

The ingest tool takes as input a specially formatted file hierarchy
representing a set of items and adds those items to a collection.

The ingest tool saves a mapping of directories to handles.
The mapping can be used to resume an interrupted ingest.

Bulk export

There is a command line tool, dspace-export, which uses the Java api.
The export tool writes a collection to the file system in a METS based format or the dspace-ingest format.

Outside of the Java API, DSpace can expose collections for harvesting through OAI-PMH.

Security

Authentication

Direct use of the Java api requires the process run as a user with appropriate
database and file system privileges on the machine hosting DSpace.

Access control

DSpace authorization is based on resource policies.
A resource policy gives a group of users the right to perform an action on
a DSpace object.

User management

Every user has an email address and name. Users can be grouped. Users
and groups can be managed through the Java api.

Other

Locking

There is no support for locking.

Virtual object representation

There is no support for virtual representations.

Transactions

Transactions are supported. Changes to a group of objects are not commited
to storage until Context.complete is called. Calling Context.abort will
throw out the changes.

Versioning

The API does not directly support versioning. See http://simile.mit.edu/dspace-mit-docs/versioning.pdf for a description of how an application could use the API to support versioning.

Searching

DSpace has an API for doing keyword searching on the dublin core metadata of
objects and possibly the text of documents stored in objects.

DSpace uses Lucene for searching.
Lucene indexes the dublin core of
DSpace objects. The set of fields indexed can be configured.
As objects are added and removed, the indexes are updated.

The filter-media command line tool can be configured to extract text from
documents and index the text with lucene. The tool needs to be run regularly
to keep the full text index up to date.

  • No labels