Comments on David Rosenthal's “The case for a revision of OAIS”
COMMENTS by David Giaretta on behalf of the working group responsible for OAIS revision
'The following contains comments to David Rosenthal’s posting “The case for a revision of OAIS” at http://wiki.dpconline.org/index.php?title=The_case_for_a_revision_of_OAIS. '
'The normal process for ISO standards involves a review after 5 years, which means that OAIS is due for revision in 2017. However, it is important to understand OAIS before proposing revisions. As indicated in the comments below, the case laid out is built on some fundamental misunderstandings of the standard, in particular not realising that OAIS provides a reference model as it very clearly states in the following way (see page 1-2): “This reference model does not specify a design or an implementation. Actual implementations may group or break out functionality differently”. '
The comments below - indented and in bold within the text from the original post - seek to correct the statements in that post.
The official title of ISO 14721 is Reference Model for an Open Archival Information System (OAIS). The role of a reference model is to provide abstract concepts and terminology by means of which concrete systems can described and analysed. A reference model is not of itself a standard against which concrete systems can be assessed for conformance, that is the role of criteria based on these concepts and terminology. In the case of ISO 14721 this role is performed by ISO 16363 and its predecessor TRAC. The effectiveness of ISO 14721 must be judged by the effectiveness of its concepts and terminology in describing concrete archival systems, and audits under TRAC and ISO 16363 provide a valuable opportunity to do so.
- COMMENT: The effectiveness of ISO 14721 is not best judged by how precisely it is able to describe any particular archival implementation, but much more on how widely it has been adopted to facilitate comparisons of archival implementations and issues. A reference model able to describe all implementations in detail would be huge, extremely complex, and effectively useless.
In July 2014 the CLOCKSS Archive was certified by CRL after a rigorous audit against the TRAC criteria, the process for certification under ISO 16363 not then being available. CLOCKSS gained an overall score that equalled the previous best, and the first ever perfect score in the "Technologies, Technical Infrastructure, Security" category. All non-confidential documents submitted to the auditors are available here. Four blog posts describe the certification, the audit process, the lessons learned, and how to run the demonstrations we showed the auditors.
In general, basing the description of the CLOCKSS Archive on the ISO 16363 criteria, and thus on the concepts and terminology of ISO 14721 worked well. Documents describing in detail the way significant OAIS concepts apply to the CLOCKSS Archive are available here. But the "lessons learned" blog post includes a section OAIS vs. CLOCKSS, reproduced here:
Writing the OAIS Conformance Documents made the mis-match between the theory of the OAIS reference model and the practice of digital preservation in the Web era, and in particular that of the CLOCKSS Archive, evident. The conceptual mis-matches between the OAIS Reference Architecture, upon which ISO 16363 is firmly based, and the CLOCKSS Archive's architecture fall into four broad areas:
- CLOCKSS is a dark archive. Eventual readers of the archive's content are unknown, and have no influence over when, whether and how content is released from the archive. The OAIS concept of Designated Community is thus difficult to apply.
- COMMENT: This is a misunderstanding of the definition of Designated Community. The Designated Community is defined (see page 1-11) by the archive. The archive does not have to see into the future – they just have to make it clear what they are doing. For example, are the CLOCKSS holdings to be directly understandable to those who only understand Japanese? There must be some criteria being employed, if only implicitly, and this should be documented as the Designated Community – however narrow or broad it may be.
The “eventual users” may or may not be part of that Designated Community, and are not required to have any influence on when, whether and how content is released. The archive will have some process for making these decisions but OAIS does not cover those.
- CLOCKSS ingests streams of content. Content ingested by crawling the Web, as much of the CLOCKSS Archive's content is, is not pushed from the content submitter to the archive but pulled by the archive from the publisher. The publishers of academic journals emit a continual stream of content; any division into units is imposed by the archive, not by the publisher. The OAIS concept of Submission Information Package, (SIP) and the relationship it envisages between the submitter and the archive, is difficult to apply. The concept of Archival Information Package (AIP) also has some detailed mis-matches, since to collect a stream an AIP must be created before it contains any content, and subsequently accumulate content over time instead of, as OAIS envisages, being wrapped around a pre-existing collection of content at creation time.
- COMMENT: The AIP is certainly defined by the archive. The SIP is a general concept and the Producer is a role rather than a specific person or organisation (see page 1-14). Someone or something is collecting the content and submitting it to the archive. That person or system is playing the role of the Producer. An individual actor can play multiple roles.
The AIP is not assumed to be created before there is any content. One could talk about an AIP container or structure that is prepared before any streaming is started. Until it has all the required components it is not a valid AIP. The archive decides how to create the AIP. OAIS specifies the kinds of information which must be logically contained in it.
- CLOCKSS has a centralized organization but a distributed implementation. Efforts are under way to reconcile the completely centralized OAIS model with the reality of distributed digital preservation, as for example in collaborations such as the MetaArchive and between the Royal and University Library in Copenhagen and the library of the University of Aarhus. Although the organization of the CLOCKSS Archive is centralized, serious digital archives like CLOCKSS require a distributed implementation, if only to achieve geographic redundancy. The OAIS model fails to deal with distribution even at the implementation level, let alone at the organizational level.
- COMMENT: OAIS is a Reference model – not an implementation model (see page 1-2). There is nothing in the OAIS Reference model that would preclude a distributed implementation of an OAIS (see pages 2-2, 4-3, 6-1 and 6-3).
The Functional Model is a logical representation, not a design for a centralised archive. OAIS does not specify how the various Functional Entities are implemented or distributed. Standards for various aspects of implementations would be better placed in a separate standard which follows the OAIS Reference Model concepts and terminology. Note for example that NASA’s Planetary Data System (PDS) has been in existence for many years and is a large distributed archive. PDS staff had no difficulty applying OAIS to the PDS.
- The CLOCKSS Archive contracts-out its operations. The CLOCKSS Archive not-for-profit achieves its low cost of operations by contracting them all out under two contracts with Stanford University. This enables many costs to be shared with the other users of the LOCKSS technology, to the benefit of both. The OAIS model fails to deal with organizational divisions such as this.
- COMMENT: Again the Functional Model does not specify how the Functional Entities are implemented (see page 4-3).
Another mis-match between OAIS and web archiving would have been a problem had CLOCKSS not been a dark archive. Access to archived Web content, via Memento (RFC7089), direct link or text search, occurs at the level of an individual URL. The OAIS concept of Dissemination Information Package is difficult to apply to access of this kind; it says:
In response to a request, the OAIS provides all or a part of an AIP to a Consumer in the form of a Dissemination Information Package (DIP). The DIP may also include collections of AIPs, and it may or may not have complete PDI. The Packaging Information will necessarily be present in some form so that the Consumer can clearly distinguish the information that was requested. Depending on the dissemination media and Consumer requirements, the Packaging Information may take various forms.
Although there is obviously a lot of room for interpretation here, it does not appear to cover the case where the Consumer requests, and the archive delivers, a digital object (the headers and body of a URL) in exactly the form it was ingested with no Packaging Information. This is what Consumers of archived Web content want. It is true that, for example, Memento adds header information to its response, but that information serves to point to other archived digital objects, potentially in other archives, so it can't be considered Packaging Information for the requested DIP. Fortunately for us, the trigger process of the CLOCKSS Archive does deliver a package containing many URLs, so it more closely matches the OAIS DIP concept.
- COMMENT: The DIP is a general concept and OAIS does not say how any particular DIP is constructed or what it will contain. If/when required, an archive must be able to provide the details of how the information in the DIP links back to the original information which the archive ingested. Not all DIPs need to contain that provenance. Packaging Information is defined as: The information that is used to bind and identify the components of an Information Package. If the response (the DIP) is sent using HTTP then the fact that it is HTTP is part of the Packaging Information – normally taken care of by the browser without the knowledge or intervention of the human user.
Our experience in the TRAC audit of the CLOCKSS Archive reveals a number of areas in which the concepts and terminology of ISO 14721 are inadequate to describe a real, functioning system. There are two ways to react to this. If you believe that ISO 14721 is not a reference model, but a definition of an archival system, your response is to say the CLOCKSS and any other system that cannot be described using only ISO 14721 concepts and terminology is not an archival system. Whatever it is doing is not archiving. Over time, as technology and the requirements of the marketplace evolve, the terminology of ISO 14721 will describe fewer and fewer systems, so the field of archiving will shrink to encompass only legacy systems.
If, on the other hand, you believe that ISO 14721 is a reference model, your response is to say that it needs updating with additional concepts and terminology adequate to describe the systems that are doing archiving is the sense in which that word is generally used. Our experience has identified a number of areas in which updating is needed, and I hope to adress them in detail in subsequent posts. I'm sure others have found other such areas, and I hope they will address them in posts to this Wiki. Lets get to work to ensure that a revised ISO 14721 matches the reality of current archival systems. Once that is done, we will need to revise the standards based upon it, ISO 16363 and ISO 16919.
- COMMENT: OAIS does not claim to be a reference manual to design archives. It claims to:
- provides a framework for the understanding and increased awareness of archival concepts needed for Long Term digital information preservation and access;
- provides the concepts needed by non-archival organizations to be effective participants in the preservation process;
- provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future Archives;
- provides a framework for describing and comparing different Long Term Preservation strategies and techniques;
- provides a basis for comparing the data models of digital information preserved by Archives and for discussing how data models and the underlying information may change over time;
- provides a framework that may be expanded by other efforts to cover Long Term Preservation of information that is NOT in digital form (e.g., physical media and physical samples);
- expands consensus on the elements and processes for Long Term digital information preservation and access, and promotes a larger market which vendors can support;
- guides the identification and production of OAIS-related standards.
- The last point is particularly relevant here. No one standard can cover everything. If it attempted to do so, then it would be too large to read and would be out of date very quickly.
- OAIS is an abstract standard which identified additional standards which need to be developed. ISO16363 is an example of such an additional standard and there are others which have been created or which are under development. Other examples include the XFDU (ISO 13527:2010) standard which describes one specific implementation of OAIS packages while the PAIS (ISO 20104:2015) describes one possible implementation of the Producer-Archive Interface.
- Surely the fundamental question when proposing revisions to OAIS is whether the core, abstract, concepts need to be updated/corrected, or whether additional standards are needed – or perhaps both. The OAIS terminology and core, abstract, concepts are logically consistent and widely applicable.
- Taking distributed archives as an example, which are mentioned in the original post as being beyond OAIS. We noted above that mapping PDS to OAIS indicates that this is not true and the core concepts of OAIS do apply. It may be sensible to create new standards for the implementation of distributed archives, for example to define new ways to implement federations or special storage systems. This would not in itself imply changes to OAIS, ISO 16363, or ISO 16919.
- As noted at the start, OAIS is scheduled for review/revision in 2017. It will be important to collect ideas/comments/corrections but it is essential to distinguish between changes in OAIS itself versus suggestions for new, separate, standards. Our comments indicate that the points made in the original post fall in the latter.
These wiki pages are licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Attribute as "Community forum for digital preservation and curation standards http://wiki.dpconline.org/". The content on this wiki represents the opinions of the author and not the Digital Preservation Coalition. This wiki is not associated with ISO, the OAIS Standard or the CCSDS.