A Survey and Evaluation of Open Source Electronic Publishing Systems

Institutional affiliation and other indicators of the viability of the open-source project

Name of system: Connexions/Rhaptos  DiVA (Digitala Vetenskapliga Arkivet)  DPubS (Digital Publishing System)  GNU EPrints  Hyperjournal  Open Journal Systems  Topaz 
Current version of system: 1.5.1    2.0  3.0.1 beta  0.5b (beta)  2.1.1  0.6 
Tested version of system:     2.0  3.0  0.5b (beta)  2.1.1   
URL of project homepage: http://rhaptos.org/  http://www.diva-portal.org/about.xsql  http://dpubs.org/  http://www.eprints.org/  http://www.hjournal.org/  http://pkp.sfu.ca/?q=ojs  http://topazproject.org/ 
Institutional affiliation: Rice University  Electronic Publishing Centre, Uppsala University  Cornell and Penn State  School of Electronics and Computer Science, University of Southampton  Net7 and the University of Pisa  Public Knowledge Project: University of British Columbia and Simon Fraser University  Independent non-profit. 
Age of project: Connnexions project started by Rice University in 1999.  Project founded in 2000  Started as Project Euclid in 2000. Morphed into DPubS, and Cornell and Penn State joined forced on this project in 2004.  Project founded in 2000  Project started in 2004.  Version 1.0 released in November 2002.  Nov 2005 
Notes on long-term viability of project: Due to the fact that this project was started in 1999 and has every appearance of going strong to this day, as well as the sponsorship of a major North American University, with additional funding from The National Science Foundation, National Instruments, the Hewlett-Packard Corporation, the George R. Brown Endowment for Undergraduate Education, and The CLASS Foundataion, this project is clearly viable for now and into the future.  Project founded in 2000 by the Electronic Publishing Centre at Uppsala University, Sweden. The DiVA consortium was founded in 2002 and as of 2006 15 Scandinavian universities have joined. These institutions now collaborate on development and future direction of the DiVA application. Two user-group meetings are sponsored every year.  DPubS has two strong institutions backing it. Development is active and ongoing. The next major version of DPubS is currently (spring 2007) under development.  Formal community programme. Fee-based EPrints Services unit. EPrints seems to be widely-deployed and well-supported.  The longterm viability of this project is uncertain. It was initially supported by the University of Pisa Political Science department as well as a small Italina software firm, Net7. Something called the "Hyperjournal Association" was then formed in an effort to create an organization suitable for longterm planning and growth of the project. The Hyperjournal Association appears be an organization that accepts fee-based memberships for funding. It is uncertain if this funding strategy will work in the long term.  OJS is a subproject of the federally (Canadian) funded Public Knowledge Project, a partnership between the University of British Columbia Faculty of Education, the Simon Fraser University Library, and the Simon Fraser University Canadian Centre for Studies in Publishing. The various projects of the PKP have been funded by: British Columbia Teachers Federation; International Network for the Availability of Scientific Publications; Canadian Association of Research Libraries; Social Sciences and Humanities Research Council of Canada; International Development Research Council; John D. and Catherine T. MacArthur Foundation; Open Society Institute, Soros Foundation; Max Bell Foundation; Government of Canada, Office of Learning Technologies.   
Degree of deployment: There are thousands of modules already in the Connexions system.    It is unclear how widely deployed DPubS is. Their wiki lists five major projects using it.  EPrints is perhaps the most widely deployed of the open-source ePublishing systems under consideration by this study. As of this writing, the application's wiki page lists 223 separate, known archives actively using the software in production.      Public Library of Science as only client right now. Topaz was commissioned by PLOS. "Not quite ready yet." 
Type of open-source license: Creative Commons Attribution License.    Educational Community License  GNU General Public License (GPL), Version 2 or later  GPL2  GNU General Public License 2+  Apache 2.0 

Technical requirements, maintenance, scalability, and documented APIs

Operating system requirements: Debian or Ubuntu  Any operating system capable of running a servlet container, e.g., Tomcat  Solaris or UNIX variant. We installed it under Ubuntu.  GNU/Linux. Also Solaris and MacOS. Version 3 now runs under Apache on Windows.  Linux, or other UNIX variant. We installed it under Ubuntu.  Windows, Unix, or Linux. Unix-like OS recommended.  Any OS supporting Java. Only tested on Linux. RPM packages. 
Application server requirements: Zope (2.7.6 or 2.7.7)/Plone  Tomcat  Apache with mod_perl.  Apache, with mod_perl  Tomcat is required to run the Sesame RDF repository.  An auxiliary application server is not required.  Tomcat 5.5 (anything Fedora runs on). 
Web server requirements:   Apache  Apache with mod_perl, mod_rewrite  Apache, with mod_perl  Apache, with mod_rewrite.  Apache 1.3.2+ or 2.0.4+ or IIS 6+ (We were, however, able to install it for testing purposes under IIS 5.1 on the WinXP platform.)  Apache or IIS (Apache preferred). 
Primary programming language: Python.  Java  Perl 5.8+  Perl 5.6.1  PHP  PHP 4.2+ (IIS requires PHP 5.0+)  Java 
Database server requirements: PostgreSQL (version 8.2+) and psycopg, Python bindings for PostgreSQL. libxml, libsxlt, cnxml, mathml2, bibtexml, qml -- xml libraries packaged and provided by Connexions. LaTex, including cjk-latex, tetex-extra, latex-ucs, latex-ucs-contrib, hbf-kanji48. Ghostscript. gif2png. Java Runtime Engine (JRE). OpenOffice 1.1X. HTML tidy library and its Python bindings.  Oracle  The application uses SQLite for persistent storage. Alternatively, the application can be configured to interact with external data stores such as Fedora and DSpace.  MySQL  MySQL  MySQL 2.23+ or PostgreSQL 7.1+  Mugara (for metadata); Fedora (for articles/content) 
Required skills: Significant skills as a system administrator are required to install and configure this software. At a minimum, one should feel comfortable installing such things as Zope/Plone, PostgreSQL, and configuring a database connection between the two.    DPubS requires significant skills as a UNIX system administrator to install. If installing on a shared server, among other Web sites and applications, one must be able to configure multiple Virtual Hosts under Apache. One must be able to troubleshoot problems related to Apache configuration and startup. One must be able to install and troubleshoot mod_perl under Apache. One must be able to troubleshoot problems with the Berkeley Database libraries.  While version 3 of this application can run under Windows, all instances of it must run under Apache, so experience setting up Apache is required as is experience setting up the many Perl modules that are required. Unfortunately, the installation documents seem to assume that EPrints will be the only application running in Apache, i.e., that it will not be running on a shared server. This is a bad assumption, which led to problems during the installation phase. Moreover, as I painfully found out, the order of operations in which the installation occurs must be followed to the letter, even if there are one or two steps that seem, on the surface, like the order in which they execute would not be relevant. Still, the documentation for installing EPrints on Ubuntu provides a step-by-step installation procedure that, if followed and not deviated from in the slightest, results in a successful install. A GUI-based installer would be nice. As it stands, installation is handled via a somewhat primitive Perl script.  In contrast to the claims made on the Hyperjournal Website, significant skills are required to install this application. There are many steps along the installation path where things can, and will, go wrong. One must make educated guesses along the way. Must be able to install Apache, with mod_rewrite enabled. (The documentation does not mention this.) Must be able to install and configure, on a UNIX host, a mail transport agent such as Sendmail or Postfix. Must be knowledgeable about UNIX permissions issues. Must know how to install TrueType fonts on a UNIX host. Must be able to install and configure Tomcat on a UNIX host and the Sesame RDF repository on Tomcat. Must be able to install and configure MySQL. Must be able to install PHP under Apache on a UNIX host and to use the PEAR utility to install various required libraries. Must be able to troubleshoot connectivity to MySQL server via JDBC driver. Must be able to troubleshoot connectivity to Sesame repository. Must be able to troubleshoot sourcecode configure scripts and make files. Installing and configuring Hyperjournal is not a trivial task. In the end, despite hints in the documentation to the contrary, the only way I was able to get Hyperjournal installed and configured properly with MySQL and a local, not remote, Sesame repository was to let the Hyperjournal GUI installation utility actually create MySQL users and databases for use by Hyperjournal and Sesame, and to let the GUI utility create the Sesame repository under Tomcat as well. Even after doing this, though, the configuration required significant tweaking in order to get such things as the automated email messages, the "captcha", and JDBC connectivity from Sesame to MySQL to work.  Required skills for setup and administration include the following: Ability to set up, configure, administer, and secure a Web server, either Apache or IIS; ability to set up and configure PHP with either the MySQL or PostgreSQL connector; ability to set up and administer either the MySQL or PostgreSQL database server.  "Very good developer required." Could install from RPM. 
API: Code extensibility:     It appears that the application codebase is significantly extensible. One must create a directory outside the root of the application codebase in which to hold one's new code. This is so that local, custom code is not overwritten upon future updates to the application proper. Then, there is a special config file within the application's directory tree that can be modified such that local code is initialized and incorporated into the application upon startup.  The application provides a defined API for the creation of plugins. It also provides support for packaged "extensions", basically entire sets of plugins all installed as a single package.  There does not appear to be an API for extending the capabilities of this application.  The application provides a robust plugin API. Examples of community-produced plugins include: An RSS/Atom feed plugin; a WYSIWYG editor plugin; an LDAP authentication plugin; a PubMed XML export plugin; a Google Scholar Gateway plugin; etc. Plugins are written in object-oriented PHP and typically extend one of the four provided base classes: Generic; importexport; auth; and gateways. The Open Journals Systems Technical Reference provides ample instruction and examples on how to write plugins for the application.   

Submission, peer review management, and administrative functions

Support for multiple, discrete publications: Yes.    Yes. One must decide first on the metadata schema to be used, create a new internal "authority" (unique identifier) for this schema, and at the command-line, edit XML configuration files accordingly. Apache must be restarted for these configuration files to take effect. At this point, a new publication with the specified metadata schema has been created, content can be loaded, and the UI can be customized.  Multiple archives are repositories are supported, each housing multiple documents, files, etc.  Publication of multiple journal titles are not supported. Only a single journal title per application instance is provided.  Yes, multiple publications (journals, and issues of those journals) are supported.  Not right now. Adding in future release. "The same article can belong to multiple journals." 
Multiple administrative roles: At a minimum, it appears that the application ships with five hardcoded roles: Authors; Maintainers; Copyright Holders; Editors; and Translators.    It appears that there are only two roles modeled by this application: Editor and User.  The application provides four distinct roles: The main Administrator; the Repository Administrator; the Editor within a given repository; and the individual User.  The application provides the following roles by default: Authors; Administrators; Reviewer; Editors.  Yes, the application provides multiple administrative roles. These roles include: Author; OJS Superuser; Journal Manager; Editor; Section Editor; Copy Editor; Layout Editor; Proofreader.  Admin and regular User. 
Administrative roles configurable: It does not appear that the administrative roles are configurable, i.e., it looks like they are hardcoded into the application and that additional administrative roles cannot be added.    The Editor and User roles of this application do not appear to be configurable, i.e., the Editor in one publication appears the have the same privileges as in another.  The roles provided by the application appear for be hard-coded, i.e., you cannot add to the number of these roles.  Administrative roles can be added, with custom sets of permissions for each.  The provided administrative roles do not appear to be configurable, i.e., there does not appear to be any way to add an administrative role to the system. That said, the administrative roles provided appear to be comprehensive and very well thought out.  No. 
Submission into system initiated by authors: Yes.    It is unclear how author-initiated submissions are handled. The wiki indicates how entire issues of properly-formatted content can be imported into the application, but no mention is made of direct author submissions.  A self-signup is provided for new authors. Once an account is generated, authors may submit to a particular respository. Their submission enters the idiosyncratic workflow for that repository where it may be reviewed and approved by, e.g., a repository editor before being put on public display.  Yes, all submissions are initiated by the author.  Submissions are author-initiated, and file uploading is done via the application. Metadata is supplied by the author at the time of submission. Resubmissions can occur at the editor's request.  Author can submit directly into ingestion directory via FTP. 
Metadata fields configurable: There is a short list of metadata fields available to all content items ("title"; "created"; "revised"; "abstract"; "keywords"; "license"; "authors"; "maintainers"; "licensors"), and the Website indicated that this list may be expanded on a content type by content type basis.  Documents are natively stored in the "DiVA Document Format", and XML-based document format consisting of 99 elements. However, the metadata structures can be configured to support other metatdata standards, e.g., Dublin Core, METS, etc.  One of the great strengths of this application is that it supports multiple custom metadata schemas, i.e., each individual publication can have its own ideosyncratic metadata schema.  The metatdata is alterable on a per-archive basis. This is accomplished via editing of two configuation files on the command-line. More, if a field is added within these configuration files, it must likewise be manually added/configured in the database. The wiki indicates that work is underway to create a "tool" which will make this whole process much easier.  The meatdata, which appears to be based on Dublin Core, does not appear to be configurable, i.e., it does not look like additional fields can be added to what is already present by default.  Metadata fields in this application do not appear to be configurable, e.g., they cannot be added to.  Multiple ingestion applications, each with its own idiosyncratic metadata schema, can be configured. 
Editorial workflow configurable per publication: Connexions was designed from the start so that authors could self-publish their works. However, in a July 19th, 2007 paper (http://rhaptos.org/docs/architecture/design/lenses/CNX%20Lens%20Functional%20Design%20Draft.pdf), Katherine Fletcher of Connexions proposes the introduction of something she calles "lenses", essentially a peer-review and workflow process for the Connexions/Rhaptos software. Interestingly, "Each lens may have a different focus; examples include lenses controlled by traditional editorial boards, professional societies, or informal groups of colleagues as well as automated lenses based on popularity, the amount of (re)use, the number of incoming links, or other metrics." In this manner, a single piece of content could be viewed through several different "lenses".    It is unclear how workflow is handled. On the one hand, it appears that new User Interface "pages" can be created and incorporated into the application. On the other hand, it is unclear just how much of the logic of the application can be manipulated via these pages. It may be that a programming could create new User Interface pages which then call the various underlying services of the application and thereby alter or create a new workflow procedure for use within the application framework. But this is something a developer/programmer would be doing, not an administrator of this application.  Separate workflows can be created on a per-repository basis using XML configuration files contained in the directory tree for that particular repository instance. It appears that segments ("stages") of the custom workflow can be restricted per user type, i.e., to Repository Administrators, to Editors, to regular Users.  Workflow is fully customizable.  While the workflow is not configurable, a lot of thought was put in to the hardcoded workflow and editorial processes hardcoded into this application. The main OJS documentation ("OJS in Ten Minutes") provides a very nice chart of the workflow modeled by the OJS application. This chart nicely illustrates the movement of a submission through the workflow process and the various interactions between authors, editors, reviewers, and other editoral staff along the way.  Not really. Article ingested, Admin approves. One step workflow. 
Stylesheets, customizable look and feel per publication: Rhaptos can be skinned to more closely match a local institution's look and feel. Skinning is accomplished at the Plone or Zope layers of this application.    The look and feel of individual publications is customizable via XSL stylesheets. At the command-line, the default directory structure containing the default stylesheets must be copied to a new directory, one that maps to the publication under consideration. The default stylesheets are then edited until the desired look and feel is attained. In addition to creating custom skins, the application makes provision for creating entirely new UI pages as well.  The look and feel is configurable on a per-repository basis. Again, this is controlled via command-line manipulation of configuration files and contents of repository directories. With each change, such things as static pages must be regenerated, the default configuration for the archive must be reloaded, and ideally the Web server must be restarted.  The logo that appears thoughout the UI is configurable from within the Administrative screens. Custom "Interface Themes" can be created with Cascading Stylesheets (CSS) and can be registered with the application by placing them in a specified directory on the underlying filesystem.  The Journal Manager controls the look and feel of each individual journal via stylesheets and custom HTML header and footer files.  Yes, via the Watermark templating engine. Only works across entire application. Next version will have "skin types". 

Access, formats, and electronic commerce functions

Internationalization support: For the most part, internationalization in Rhaptos is provided by the underlying Plone application.      The application and database fully supports Unicode encoding (utf-8). Locale files are installed and configured at the command-line.  A configurable list of acceptable languages is presented to the author upon as part of the submission process.  The backend database must support UTF-8 (Unicode) encoding in order for special characters to be stored, retrieved, and then displayed properly. The en_US locale is installed by default. Locale files for the following also ship with the default configuration and can be activated after initial installation: es_ES; fr_CA; it_IT; pt_BR; ru_RU; tr_TR. In addition to these, locale files for the following languages are currently under development: Arabic; Catalan; Chinese; Croatian; Farsi; Hindi; Norwegian; Thai; Vietnamese.  Unicode compliant, yet no language packs as of yet. 
Output in multiple document formats:     It does not appear that the application itself generates output formats. Rather, the application can accept multiple input formats and so the format in which a document is initially submitted remains the format in which is is ultimately provided.  Insofar as the application accepts multiple documents formats for input, it likewise provides those documents to the user in the same format in which they were submitted.      No. 
Document formats supported:   PDF (via Apache FOP)    The default document formats supported include: Plain text; HTML; PDF; Postscript; MS Powerpoint; MS Word; JPEG; PNG; GIF; TIFF; BMP; MPEG; Quicktime; AVI.      All submissions must be in NLM DTD 2.0+ format. 
Full-text search and retrieval: Full text searching is supported.  Full text search and retrival is supported via the Apache Lucene engine.  Yes, via the Lucene engine.  Yes. The following are required: xpdf (for PDF indexing); wvware (for MS Word indexing); lynx (for HTML indexing)  There does not appear to be a fulltext index of the entire article. Titles are keyword searchable. Author names are searchable. There is a controlled-vocabulary subject search as well.  Full-text indexing is supported for the following file formats: Text; RTF; Microsoft Word; PDF; Postscript.  Lucene 
Authentication mechanisms: Presumably, Connexions/Rhaptos has at its disposal all the functionalities of its underlying Plone foundation. Such functionality would include the use of, e.g., the PloneLDAP library for authenticating against an external LDAP or Active Directory service.    Authentication appears to be entirely internal, i.e., there is no provision for authentication against an external service.  The application can be configured to support authentication against an external LDAP server. By default, it authenticates against its internal authentication store. Interestingly, the application can be configured to be a Login-Only repository (where all interations with it must first be authenticated) or as a repository in which user registration is not even required.  Authentication is provided internally, by the application itself. There appears to be no provision for authentication against an external store or service.  Authentication can occur against either the backend database or against an external LDAP server. The plugin for LDAP authentication is provided with the default OJS software package.  Single signon capability, against CAS. 
Subscription services: It does not appear that any sort of subscription service has been implemented, although there is a document on the developer's Wiki making a proposal for the inclusion of primitive subscription services in a future release.    The application provides for subscription services and provides access control function on a per IP, per domain, or per user basis.  Insofar as this application is not a electronic publishing system in the same sense as the other systems under consideration by this study are, it does not provide subscription services. In another sense, though, it supports RSS and Atom feeds, so at least in that sense the notion of "subscription" is provided.  No subscription services are provided.  There is an entire administrative module to manage subscription services per individual journal that can be activated by the Journal Manager. Such subscription-related attributes as Subscription Type (e.g., individual or institutional); Subscription Policies; Subscription Expiry Reminders; and Delayed Open Access for Subscription Journals are included. Journal Managers are provided an administrative interface for created subscriptions. This interface includes such things as Subscription Type; start and end dates; Membership requirements of the subscribing party; Domain, if access to subscribed publications are to be restricted by domain; and IP ranges, if access to subscribed publications are to be restricted by IP range.  Email subscription. 
Electronic commerce functions:     This was the only application under consideration by this study whose documentation even mentioned eCommerce functions. Presumably, one could use this application in an eCommerce setting by controlling access via the subscription services it models. Most notably, the subscription services can control access by domain.  The application does not appear to provide any sort of ecommerce functions. Its main bent, in fact, is to provide fully open access to materials.  No ecommerce services are provided.     

Summary data

Strengths:     Impressive, well-thought-out service-based application architecture. Provision for subscription services. Highly customizable metadata schema.  The application nicely provides facility for controlled-vocabulary indexing of documents using the Library of Congress Subject Headings and/or the organizational structure of one's local institution. The default application is simple, yet powerful. The administrative roles and default, streamlined workflow are well-thought-out and useful. The workflow, branding, and import/export is all configurable, though all at the command-line by a system administrator and not particularly easy or straightforward.  The user interface (UI) of the application is well-laid-out and easily-understood. It was appealing and a pleasure to work with. The default administrative roles and workflow were well-thought-out. Hyperjournal is the first example of its kind: A Semantic-Web-Aware electronic publishing system. All of its data is exposed as RDF for harvesting and use within the Semantic Web rubric. Its "contextualization" features provide powerful and useful bibliometric tools and allow users to quickly enter a stream of relevant, linked, bibliographic data.  Easy installation. Platform independent. Excellent and comprehensive documentation. Well-implemented plugin support. Well-thought-out workflow and administrative roles. Solid support for internationalization via UTF-8 encoding and locale files.   
Weaknesses:     Platform dependent. Web server dependent. Extraordinarily difficult to install. Primitive initialization script. Installation documents assume that DPubS will be the only application running on the server, i.e., it's not intended to run in tandem with any other application. Installation assumes one is installing Apache from source. Installation assumes one is installing mod_perl from source. Documentation significantly incomplete/incorrect. It does not appear that this application is intended to model/facilitate the entire peer review process. Rather, it looks like the application is intended to provide a repository for already completed publications and to then provide a Web-based interface to them.  Installation procedures assume that ePrints is being installed on its own server. Web server dependent (Apache). Primitive installation script. The configuration of the application as a whole, as well as of each individual archive, is performed at the command-line by a system administrator, using text-based configuration files. The EPrints wiki indicates that administrative tools, presumably GUI in nature, are currently under development.  A challenge to install. Installation documentation slightly, yet significantly, out of date. Platform dependent. Data import/export is missing. No defined APIs for code extensibility/development of extensions or plugins.  It would be nice if an authentication plugin would be provided to allow Shibboleth authentication as well as authentication against other single sign-on utilities, e.g., CAS; WebAuth; SiteMinder.