4.3 INFORMATION PACKAGE TRANSFORMATIONS: Difference between revisions

From wiki.dpconline.org
Jump to navigation Jump to search
(Created page with "The previous portions of this section have discussed the functional architecture of an OAIS and an information architecture to represent the Information Packages and associate...")
 
mNo edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{:OAISheader}}
The previous portions of this section have discussed the functional architecture of an OAIS and an information architecture to represent the Information Packages and associated Package Descriptions and Packaging Information. This subsection looks at the transformations, both logical and physical, of the Information Package and its associated objects as they follow a lifecycle from the Producer to the OAIS, and from the OAIS to the Consumer.
The previous portions of this section have discussed the functional architecture of an OAIS and an information architecture to represent the Information Packages and associated Package Descriptions and Packaging Information. This subsection looks at the transformations, both logical and physical, of the Information Package and its associated objects as they follow a lifecycle from the Producer to the OAIS, and from the OAIS to the Consumer.


Figure 4-26 presents a high-level data flow diagram that depicts the principle data flows involved in OAIS operations. These flows do not include administrative flows such as accounting and billing.
Figure 4-26 presents a high-level data flow diagram that depicts the principle data flows involved in OAIS operations. These flows do not include administrative flows such as accounting and billing.
[[File:Figure 4-26 High-Level Data Flows in an OAIS 650x0m2.jpg|600px]]


'''Figure 4-26: High-Level Data Flows in an OAIS'''
'''Figure 4-26: High-Level Data Flows in an OAIS'''
Line 7: Line 11:




== 4.3.1 DATA TRANSFORMATIONS IN THE PRODUCER ENTITY ==
***[[4.3.1 DATA TRANSFORMATIONS IN THE PRODUCER ENTITY]]
 
***[[4.3.2 DATA TRANSFORMATIONS IN THE INGEST FUNCTIONAL AREA]]
 
***[[4.3.3 DATA TRANSFORMATIONS IN THE ARCHIVAL STORAGE AND DATA MANAGEMENT FUNCTIONAL AREAS]]
The data within the data Producer entity are private and may be in any format the Producer desires. However, when the decision is made to store the data in an OAIS, the Producer who is responsible for the data meets with archivists to negotiate a Submission Agreement as discussed in 2.3.2 of this document. This agreement defines information such as the content, format, and scheduled arrival times of the Submission Information Package (SIP). The SIP is an Information Package that is provided to the OAIS by the Producer. The SIP consists of the Content Information plus the data that is necessary to assure that those data can be maintained by the OAIS and that the data can be interpreted and used by Consumers who withdraw them from the OAIS in the future.
***[[4.3.4 DATA FLOWS AND TRANSFORMATIONS IN THE ACCESS FUNCTIONAL AREA]]
 
These SIPs are periodically transferred to the OAIS in a Data Submission Session. The number of Data Submission Sessions between an OAIS and a Producer can range from a single session in the transfer of a final data product to multiple sessions a day in the case of active OAIS which store data for experiments which are still in process. The Data Submission Session can be logically viewed as sets of content Data Objects and description objects, although physically the description can be included in the digital objects (i.e., self- describing objects) or divided into many separate descriptive items. In addition to the logical view of data (the SIP), the specification of a data delivery session must also include the mapping of the objects to the media on which they are delivered. This mapping includes the encoding of the object and description and the allocation of logical objects to files
 
 
== 4.3.2 DATA TRANSFORMATIONS IN THE INGEST FUNCTIONAL AREA ==
 
 
Once the SIP is within the OAIS, its form and content may change. An OAIS is not always required to retain the information submitted to it in precisely the same format as in the SIP. Indeed, preserving the original information exactly as submitted may not be desirable. For example, the computer medium on which submitted images are recorded may become obsolete, and the images may need to be copied to a more modern medium. In addition, some types of information such as the unique identifier used to locate the Information Package within the OAIS will not be available to the Producer and must be input during the Ingest process to the OAIS.
 
The mapping between SIPs and AIPs is not one-to-one. Here are some examples:
 
– One SIP—One AIP: A government agency is ready to Archive its electronic records from the previous fiscal year. All of the year’s records are placed onto magnetic tapes that are submitted as one SIP. The Archive stores the tapes together as a single AIP.
 
– Many SIPs—One AIP: A satellite sensor makes observations of the Earth over a period of one year. Every week all of the latest sensor data are submitted to the Archive as a SIP. The Archive has a single AIP containing all of the sensor’s observations for the year. Ingest merges the Content Information from each weekly SIP into a specified file/files in Ingest persistent storage. The PDI data for the AIP is sent after the last sensor data for the year has been received. After all of the weekly SIPs and the SIP containing the PDI have arrived, Ingest processes the AIP.
 
– One SIP—Many AIPs: A company submits financial records to an Archive as one SIP. The Archive chooses to store this information as two AIPs: one that contains public information and the other that contains sensitive information. This makes it easier for the Archive to manage access to the information.
 
– Many SIPs—Many AIPs: An oil and gas company collects information on its wells.
Every year it submits SIPs containing all of the well status information for one well to an Archive. The Archive maintains one AIP for each oil or gas field and breaks out the information on each well to the proper AIP based upon its geographic coordinates.
 
The ingest process transforms the SIPs received in the Data Submission Session into a set of AIPs and Package Descriptors which can be stored and accepted by the Archival Storage and Data Management functional entities. The complexity of this ingest process can vary greatly from OAIS to OAIS or from Producer to Producer within an OAIS. The simplest form of the process involves removing the Content Information, PDI and Package Descriptors from the Producer transfer media and queuing them for storage by the Archival Storage and Data
 
Management functional entities. In more complex cases, the PDI and Package Descriptors may have to be extracted from the Content Information or input by OAIS personnel during the ingest function; the encoding of the information objects or their allocation to files may have to be changed. In the most extreme case, the granularity of the Content Information may be changed, and the OAIS must generate new PDI and Package Descriptors reflecting the newly generated information objects. When many SIPs are required for the creation of one AIP, the Ingest functional area will provide temporary storage for the SIPs until all the SIPs required for the AIP arrive.
 
In addition, the Ingest Functional Entity will classify incoming information objects and determine in what existing collection or collections each object belongs and will create messages to update the appropriate Collections Descriptions after the AIPs are stored in Archival Storage. The OAIS and external organizations may provide additional Associated Descriptions and finding aids that allow alternative access paths to the information objects of interest. Researchers will develop new and fundamentally different access patterns to information objects. It is important that an OAIS’s Ingest and internal data models are sufficiently flexible to incorporate these new descriptions so the general user community can benefit from the research efforts. A good example of this type of new associated description is a phenomenology database in Earth Observation, which allows users to obtain data for a desired event, such as a hurricane or volcano eruption, from many instruments with a single query. It is important to note that such finding aids may become obsolete unless the data they require are preserved as parts of the AIPs they access.
 
It is expected that the Ingest Functional Entity will coordinate the updates between Data Management and Archival Storage and provide appropriate coordination and error recovery. The AIP should first be stored in Archival Storage. The confirmation of that operation will include a unique identification to retrieve that AIP from Storage. This identifier should be merged into the Package Description prior to the addition of the Collection Description to Data Management.
 
 
== 4.3.3 DATA TRANSFORMATIONS IN THE ARCHIVAL STORAGE AND DATA MANAGEMENT FUNCTIONAL AREAS ==
 
 
The Archival Storage Functional Entity takes the AIPs produced by the Ingest process and merges them into the permanent Archive holdings. The Data Management Functional Entity takes the Package Descriptions produced by Ingest and augments the existing Collection Descriptions to include their contents. The logical model of the ingested data should already be mapped into the logical model of the Archive’s holdings. Thus the major transformation that occurs in this step is the mapping of the acquisition session from the ingest physical data model. This will tend to be on temporary storage, to the permanent storage of the OAIS, which could range from a Database Management Systems (DBMS) to a Hierarchical File Management Systems (HFMS), or any mixture of the above.
 
The internal view of the OAIS is the permanent representation of the archived data, so all encoding and mappings must be well documented and understood. The transferring of ingest objects is frequently done by a software process such as an HFMS driver or a DBMS. In this case, it is the responsibility of the OAIS to maintain an active copy of the software or careful documentation of the internal formats so the data can be transferred to other systems in the future without loss of information.
 
== 4.3.4 DATA FLOWS AND TRANSFORMATIONS IN THE ACCESS FUNCTIONAL AREA ==
 
 
When a Consumer wishes to use the data within the OAIS, a Finding Aid may be used to locate information of interest. Finding Aids present Consumers with the logical view of the OAIS holdings so the Consumers can decide which AIPs to acquire. At a minimum, the access view is the high-level logical view of the Collection Descriptions discussed in 4.2.2.8. The OAIS may have to spend significant time and effort developing Associated Descriptions and Finding Aids such as catalogs that will aid the Consumer in locating AIPs or AICs of interest. A Consumer will establish a Search Session with the Access entity. During this Search Session, the Consumer will use the OAIS Finding Aids to identify and investigate potential holdings of interest. This searching process tends to be iterative, first identifying broad criteria and then refining the criteria on the basis of previous search results. When candidate objects of interest are identified, more sophisticated Finding Aids such as browse image viewers or animation may be used to further refine a Result Set.


Once the Consumer identifies the OAIS holdings to acquire, the Consumer uses an OAIS- supplied Ordering Aid to develop an order request to acquire the data. The Consumer produces a logical view of the desired AIPs and associated Package Descriptions to be included in the Dissemination Information Package and specifies the physical details of the Data Dissemination Session such as media type and object format. This process may involve no visible interaction between the Consumer and the OAIS if adequate defaults exist. The order can also specify any transformations the Consumer wishes applied to the AIPs in creating the DIP.


The Access functional area then records the Order Agreement in the Data Management functional area. When the conditions required to satisfy a recorded Order Agreement are met (for many Order Agreements these conditions are met immediately, but if not Administration notifies Access when they are met) the Access functional area coordinates the response. Access contacts the Storage and Data Management functional areas and requests the AIPs and associated Package Descriptions necessary to populate the DIP requested by the Consumer. The Storage and Data Management functional areas create copies of the requested objects in temporary storage.
'''--Please retain original text above for reference. Propose amendments or additions below this line or respond using the Discussion tab above--'''


Access then transforms this set of the AIPs and associated Package Descriptions into a set of DIPs and stores those DIPs onto physical distribution (either physical or communications) media to be delivered to the Consumer in a Data Dissemination Session. The complexity of this transformation process can differ greatly on the basis of the level of processing services offered by the OAIS and requested by the Consumer’s order. In the simplest case, the DIP contains duplicates of the AIPs and associated Package Descriptions of interest from Storage and Data Management function. In more complex cases, the desired Content Information may have to be extracted from the information objects or inserted into self-describing information objects, and the encoding of the information objects or their allocation to physical files may have to be changed. In the most extreme case, when the OAIS supports subsetting services, the granularity of the information objects may be changed, and the Dissemination process may generate DIPs and associated Package Descriptions reflecting the new granularity. The mapping between DIPs and AIPs is one-to-one if no transformations are requested; however, the use of subsetting services and other product processing options could create many DIPs from a single AIP, or a single DIP based on combining many AIPs.
{{:OAISfooter}}

Latest revision as of 13:19, 5 October 2015

OAIS Community Logo small.png

Community Forum | OAIS Community | OAIS Structure | OAIS Blog Posts | Active Topics and News

The previous portions of this section have discussed the functional architecture of an OAIS and an information architecture to represent the Information Packages and associated Package Descriptions and Packaging Information. This subsection looks at the transformations, both logical and physical, of the Information Package and its associated objects as they follow a lifecycle from the Producer to the OAIS, and from the OAIS to the Consumer.

Figure 4-26 presents a high-level data flow diagram that depicts the principle data flows involved in OAIS operations. These flows do not include administrative flows such as accounting and billing.

Figure 4-26 High-Level Data Flows in an OAIS 650x0m2.jpg

Figure 4-26: High-Level Data Flows in an OAIS



--Please retain original text above for reference. Propose amendments or additions below this line or respond using the Discussion tab above--


OAIS Community Logo small.png

Community Forum | OAIS Community | OAIS Structure | OAIS Blog Posts | Active Topics and News

These wiki pages are licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Attribute as "Community forum for digital preservation and curation standards http://wiki.dpconline.org/". The content on this wiki represents the opinions of the author and not the Digital Preservation Coalition. This wiki is not associated with ISO, the OAIS Standard or the CCSDS.