Skip to main content

Open Source Digital Preservation in Queensland

There was a quiet little note appearing on the Queensland State Archive (QSA) social media feeds a couple of months ago…

This Digital Archive is the culmination of a lot of work from QSA, our team here at Gaia Resources and our partner, Hudson Molonglo.  This milestone is the launch of their open source Digital Preservation System, based on Archivematica,  alongside the previously launched and now integrated open source Archival Management System, based on ArchivesSpace.

A Digital Preservation System (DPS) isn’t a Document Management System, or a network drive, or a hard drive stored away somewhere.  It’s a fairly simple system but has a lot of things to do under the hood.  In essence, a DPS takes a digital file that is provided to it, checks it’s not corrupt or infected, works out the best long term format to store it in so that it can be opened in the future, and then converts and stores it, along with the original, somewhere safe.

At QSA, this happens in a specific way - as the agencies deliver their digital files through ArchivesGateway (a web portal specifically designed for agencies to access and deliver data to QSA) the files go into the Quarantine Zone.  They sit there in quarantine for a while so that there is a chance that the virus definitions that are used to scan incoming files will be updated and recognise any potential infection in the files.  If you started virus scanning on receipt, you might actually miss things that were newly infected - virus scanners and their definitions may not have yet caught up with these new computer viruses.

Once the file has served its time in quarantine, it gets processed through the Archivematica DPS.  Multiple things run here - Fixity checks are run to make sure that the file is not corrupt (checking the file against how the file describes itself, to make sure there’s no discrepancies) along with the virus checks using those updated definitions.  There’s a lot to do here to make sure we don’t store something that is corrupted in any way.

Then the fun of finding out what sort of file to store it in begins - checking the file type of the digital file that was provided against the PRONOM registry (which you can manually search here https://www.nationalarchives.gov.uk/PRONOM/).  The PRONOM registry has a big listing of the formats that might be out there, and what the recommended long term storage format is (e.g. you might want to turn all JPG images into PNG images).  Then, there are a lot of functions and software libraries within Archivematica (a real Swiss army knife of digital preservation tools) that enable this conversion to happen.

The important part of this is also the long term storage of the file - the Archival Information Package (AIP - pronounced “ape”).  This AIP has a bunch of important metadata in it including the original file, the converted file, and all the process metadata that happened along the way.  This is all really important stuff for your future self when you are trying to find ways to open this file.

For QSA, this all gets stored away in redundant long term robust storage - in two separate places.  One is the Amazon Web Services cloud storage, where it’s replicated across multiple data centers on hardware that’s all located in Australia.  A second copy is also stored in a government data center in Queensland as well.  While this might feel like a lot, you don’t want to risk losing your stuff from a hardware failure.

The DPS system also integrates tightly with the ArchivesSpace system - allowing archivists to see what the DPS did as the metadata is passed back to ArchivesSpace and displayed there.  This means that there is a single point of truth for all this metadata about processing, a key part of good digital archiving practice - indeed this whole approach is compliant with the Open Archival Information System, pictured below.

Mathieualexhache (original work); Mess (SVG conversion & English translation), CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons
Image Credit: Mathieualexhache (original work); Mess (SVG conversion & English translation), CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

This all feels a bit clinical and all about bits and bytes, but think about the things that are moving through this pipeline: digital video files, digital photos, digital documents, scans of audio-visual other records that were at risk of being lost if they hadn’t been digitised, and all sorts of amazing aspects of the history of Queensland.  The amount of information that QSA has is amazing (and you can even see some of their archival photos proudly on display when you walk through the domestic airport terminal in Brisbane!).

The DPS within the QSA has been the results of a lot of hard work by a lot of people, as I mentioned at the start.  For me, it’s been something of a labour of love - pointing my company at a really worthwhile endeavour to be part of preserving our history for the future, and to help deliver this enterprise grade system using open source software.

We love talking about Digital Preservation, Archiving in general and Collections - so if you’re working in this space, start a conversation with us on our social media platforms (Facebook, Linkedin or Instagram) or drop me a line at piers@gaiaresources.com.au.

Piers