Last week I had an email from Archives. Seems they were unhappy with their current CMS approach (enter into an Access database, export to Excel, run code to generate HTML pages, upload pages) and wanted something a bit smoother, simpler and more immediate. Can I help?
As it happens, I had been thinking already about building something for our Special Collections digitisation projects, so I was happy to take a look.
We have already been attempting to use DSpace for this purpose, but DSpace is not really good as a presentation service, being designed as a repository for publications. For Special Collections, and also Archives, we’re talking about interesting, heritage material that needs to be nicely presented.
So, I sat around thinking for a while about design. Also looked at the EAD (Encoded Archival Description) XML spec, to see how “real” archivists approached the problem. And of course to make sure there was nothing open source already out there. Eventually I arrived at what I am calling the Lightweight Archive System, or LARK.
The basic design objectives were: simple to use; simple to build; simple to maintain; configurable.
That required a simple architecture, so instead of trying to use a database for storing items, I decided to make use of the Unix file system. Content and other files would simply be stored as unix files in a directory tree mirroring the structure of the archive.
First, we have to be clear about what kinds of object we’re talking about. To keep the design simple and tight, I defined “object” as follows: an object is any “thing” in the archive which can be considered as a whole or as a collection of parts. An object may be a simple document (a photograph, letter or other digital copy of a real thing); or an object may be some kind of container (room/cabinet/drawer/folder, or box/envelope). Containers are not defined or constrained as to type, because in LARK all containers are basically the same: a container is an object which may contain containers or documents.
This nested structure keeps the coding really tight, because there are really only two kinds of “thing” to deal with. We don’t really care what those things are.
Having sorted all that out, the actual coding fell into place quite easily. An XML config file is used to configure a particular collection, defining the fields applicable to objects, and the HTML header and footer. Creating or updating an object generates a new HTML page for that object, with links to contents etc. And there are simple options to add and delete objects.
So there we are: a lightweight archive system, written in Perl as a 500-line cgi script. Nice and simple.
At least I hope it’s simple. I am now at that dangerous stage in development where I have to show it to the “clients” and get their feedback and (hopefully) acceptance. Will let you know how *that* goes!


