Reply from Hervé L'Hours, UKDA
The original OAIS release included recommendations for future development including a more detailed approach to the handling of appraisal and custody transfer to the repository (now PAIMAS).
OAIS uses the term ‘Pre-Ingest’ once, in the discussion under “4.1.5 The repository shall have an ingest process which verifies each SIP for completeness and correctness.” : If an inventory of files was provided by a producer as part of pre-ingest negotiations, one would expect checks to be carried out against that inventory.“
PAIMAS mentions the Pre-Ingest only once as an alternate term for the first of its four phases: “The Preliminary Phase, also known as a pre-ingest or pre-accessioning phase, includes the initial contacts between the Producer and the Archive and any resulting feasibility studies, preliminary definition of the scope of the project, a draft of the SIP definition and finally a draft Submission Agreement.”
But the more recent (2012) Interface Specification (651x1r1-ProducerArchiveInterfaceSpec-PAIS-RedBook-Feb2012.pdf) doesn’t mention it at all.
The use of the term more colloquially has grown among curation professionals to refer to the repository processes from first contact with a potential ‘depositor’ to the arrival of an approved ‘deposit’ in the repository (carefully avoiding the term SIP here). But it is also used/misused at times to denote all pre-repository stages of the digital object lifecycle.
Does this use/misuse indicate that:
1. The term deserves formalisation as an important phase of the repository process? 2. The OAIS would benefit from more clearly placing itself within the full digital object lifecycle?
Topics important to pre-ingest (“processes from first contact with a potential ‘depositor’ to the arrival of an approved ‘deposit’”) include:
- Contact management
- Formal identification of depositor contacts
- Confirmation that the offered data collection:
- meets the repository’s collections development criteria (appropriate subject matter, IPR and rights status etc)
- is of suitable technical quality for use and preservation (low risk file formats, sufficient supporting metadata etc)
- has been appropriately validated in terms of risk (virus detection to appropriate anonymisation of human subjects)
- is suitably structured to support the needs of the designated community (from structural metadata, to file naming to including ‘pre-repository’
- Standard procedures for custody transfer
Barbara, I think the “want to distinguish the ‘raw material’ received from the producer from the material that is being processed to become a SIP” is tricky. I think the OAIS authors would suggest that this is the SIP though of course we know that we’re collecting relevant metadata well before a deposit is received. I think the OAIS specifically identifies the three states SIP/DIP/AIP as covering everything, with the SIP exactly as deposited and then recording of all actions to create the AIP and DIP. But we also know that some organisations may receive sample data, or multiple submissions before the ‘deposit process’ is closed. Here at the UKDA we ‘declare’ the existence of a SIP once all relevant QA is complete, until that point we just have (potentially) multiple ‘deposits’ in the acquisition process.
So I agree about Pre-Ingest, and I understand the idea of a PSP, but I’m not convinced that OAIS would ever consider an additional ‘object’ like that. I think that they’d suggest that any metadata you collect during the Pre-Ingest process becomes part of the AIP i.e. that you begin creating the AIP even before the SIP/deposit arrives.
I agree about the ‘raw’ data stuff though, I’d particularly like to be able to request deposit of ‘versions’ of the data that have been used to support ‘pre-repository’ publications. We know that people might use our DOI’s for publications which are actually based on Pre-Repository (so pre-QA) versions of the data during the production process. We want to support this but also to distinguish these versions from our own ‘better’ (or at least different) version. One solution to this might be to offer pre-repository DOI minting to researchers and then to design the SIP to make sure we can receive multiple data versions.
What do you think?