The case for a revision of OAIS
The official title of ISO 14721 is Reference Model for an Open Archival Information System (OAIS). The role of a reference model is to provide abstract concepts and terminology by means of which concrete systems can described and analysed. A reference model is not of itself a standard against which concrete systems can be assessed for conformance, that is the role of criteria based on these concepts and terminology. In the case of ISO 14721 this role is performed by ISO 16363 and its predecessor TRAC. The effectiveness of ISO 14721 must be judged by the effectiveness of its concepts and terminology in describing concrete archival systems, and audits under TRAC and ISO 16363 provide a valuable opportunity to do so.
In July 2014 the CLOCKSS Archive was certified by CRL after a rigorous audit against the TRAC criteria, the process for certification under ISO 16363 not then being available. CLOCKSS gained an overall score that equalled the previous best, and the first ever perfect score in the "Technologies, Technical Infrastructure, Security" category. All non-confidential documents submitted to the auditors are available here. Four blog posts describe the certification, the audit process, the lessons learned, and how to run the demonstrations we showed the auditors.
In general, basing the description of the CLOCKSS Archive on the ISO 16363 criteria, and thus on the concepts and terminology of ISO 14721 worked well. Documents describing in detail the way significant OAIS concepts apply to the CLOCKSS Archive are available here. But the "lessons learned" blog post includes a section OAIS vs. CLOCKSS, reproduced here:
Writing the OAIS Conformance Documents made the mis-match between the theory of the OAIS reference model and the practice of digital preservation in the Web era, and in particular that of the CLOCKSS Archive, evident. The conceptual mis-matches between the OAIS Reference Architecture, upon which ISO 16363 is firmly based, and the CLOCKSS Archive's architecture fall into four broad areas:
- CLOCKSS is a dark archive. Eventual readers of the archive's content are unknown, and have no influence over when, whether and how content is released from the archive. The OAIS concept of Designated Community is thus difficult to apply.
- CLOCKSS ingests streams of content. Content ingested by crawling the Web, as much of the CLOCKSS Archive's content is, is not pushed from the content submitter to the archive but pulled by the archive from the publisher. The publishers of academic journals emit a continual stream of content; any division into units is imposed by the archive, not by the publisher. The OAIS concept of Submission Information Package, (SIP) and the relationship it envisages between the submitter and the archive, is difficult to apply. The concept of Archival Information Package (AIP) also has some detailed mis-matches, since to collect a stream an AIP must be created before it contains any content, and subsequently accumulate content over time instead of, as OAIS envisages, being wrapped around a pre-existing collection of content at creation time.
- CLOCKSS has a centralized organization but a distributed implementation. Efforts are under way to reconcile the completely centralized OAIS model with the reality of distributed digital preservation, as for example in collaborations such as the MetaArchive and between the Royal and University Library in Copenhagen and the library of the University of Aarhus. Although the organization of the CLOCKSS Archive is centralized, serious digital archives like CLOCKSS require a distributed implementation, if only to achieve geographic redundancy. The OAIS model fails to deal with distribution even at the implementation level, let alone at the organizational level.
- The CLOCKSS Archive contracts-out its operations. The CLOCKSS Archive not-for-profit achieves its low cost of operations by contracting them all out under two contracts with Stanford University. This enables many costs to be shared with the other users of the LOCKSS technology, to the benefit of both. The OAIS model fails to deal with organizational divisions such as this.
Another mis-match between OAIS and web archiving would have been a problem had CLOCKSS not been a dark archive. Access to archived Web content, via Memento (RFC7089), direct link or text search, occurs at the level of an individual URL. The OAIS concept of Dissemination Information Package is difficult to apply to access of this kind; it says:
In response to a request, the OAIS provides all or a part of an AIP to a Consumer in the form of a Dissemination Information Package (DIP). The DIP may also include collections of AIPs, and it may or may not have complete PDI. The Packaging Information will necessarily be present in some form so that the Consumer can clearly distinguish the information that was requested. Depending on the dissemination media and Consumer requirements, the Packaging Information may take various forms.
Although there is obviously a lot of room for interpretation here, it does not appear to cover the case where the Consumer requests, and the archive delivers, a digital object (the headers and body of a URL) in exactly the form it was ingested with no Packaging Information. This is what Consumers of archived Web content want. It is true that, for example, Memento adds header information to its response, but that information serves to point to other archived digital objects, potentially in other archives, so it can't be considered Packaging Information for the requested DIP. Fortunately for us, the trigger process of the CLOCKSS Archive does deliver a package containing many URLs, so it more closely matches the OAIS DIP concept.
Our experience in the TRAC audit of the CLOCKSS Archive reveals a number of areas in which the concepts and terminology of ISO 14721 are inadequate to describe a real, functioning system. There are two ways to react to this. If you believe that ISO 14721 is not a reference model, but a definition of an archival system, your response is to say the CLOCKSS and any other system that cannot be described using only ISO 14721 concepts and terminology is not an archival system. Whatever it is doing is not archiving. Over time, as technology and the requirements of the marketplace evolve, the terminology of ISO 14721 will describe fewer and fewer systems, so the field of archiving will shrink to encompass only legacy systems.
If, on the other hand, you believe that ISO 14721 is a reference model, your response is to say that it needs updating with additional concepts and terminology adequate to describe the systems that are doing archiving is the sense in which that word is generally used. Our experience has identified a number of areas in which updating is needed, and I hope to adress them in detail in subsequent posts. I'm sure others have found other such areas, and I hope they will address them in posts to this Wiki. Lets get to work to ensure that a revised ISO 14721 matches the reality of current archival systems. Once that is done, we will need to revise the standards based upon it, ISO 16363 and ISO 16919.
These wiki pages are licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Attribute as "Community forum for digital preservation and curation standards http://wiki.dpconline.org/". The content on this wiki represents the opinions of the author and not the Digital Preservation Coalition. This wiki is not associated with ISO, the OAIS Standard or the CCSDS.