Jpylyzer business case template

From wiki.dpconline.org
Jump to navigation Jump to search

Using Jpylyzer in a production environment: benefits, risks and a sample business case

Use this to help develop a convincing business case for the use of Jpylyzer in your organisation

About this case study

This case study presents generic business benefits and risks for applying Jpylyzer in a production environment, followed by an example of how these benefits might be tailored for, and presented in, a specific business case. This resource should be utilised in the context of the guidance materials provided elsewhere in the Digital Preservation Business Case Toolkit.

How to use this case study

The generic benefits and risks in this case study can be reused by organisations who would like to present a business case for the use of Jpylyzer, but they must be tailored to that organisation's particular needs, aims and contextual situation. The Jpylyzer case study shows how the generic business benefits and risks were adapted to meet the specific needs of an organisation (in this case a theoretical one!). Applying benefits and risks requires careful analysis, adaptation, use of language and prioritisation as described elsewhere within this toolkit. Elements from this case study are replicated in relevant parts of the Toolkit (for example the Benefits from this case study are replicated on the DPBCT benefits template page).

Jpylyzer benefits

This section provides descriptions of business benefits of using Jpylyzer and aligns with the DPBCT section on Benefits

Jpylyzer benefit summary

This section provides a summary of generic business benefits for using Jpylyzer. Direct benefits:

  • Mitigates key JP2 preservation risks
  • Enables automated JP2 validation that increases the quality of resulting digitised collections
  • Ensures JP2s conform to intended institutional profile

Indirect benefits:

  • Opens up application of JP2 for storing digitised masters (by mitigating preservation risks as described above)
    • Usage of JP2 lossy compression can reduce storage costs significantly
      • In turn this can enable more digitisation, a costly activity
    • Optimised JP2 images can enhance remote access and the delivery of images, providing a better experience for the user
  • Provides a method of validating aspects of the quality of digitisation performed by 3rd parties. This is particularly important where organisations have out sourced digitisation but want an economical method of verifying quality themselves (note: quality assurance in this case applies only to JP2 construction, not the visual image quality of the digitisation).
  • Efficient and automated quality assurance processes will catch bad files early and allow action to be taken in a timely manner. Rectifying problem files later can become more costly.

Benefits by dimensions of scalability

This section describes generic business benefits for using Jpylyzer in the context of the four SCAPE Project dimensions of scalability.

  • Number of objects
    • Quality checking huge numbers of objects manually requires considerable resources. Even manual checks of a small percentage of objects can be costly. Manual checking has been shown to be ineffective at identifying some examples of badly formed JP2s. Jpylyzer can be applied to automatically check every object passing through a digitisation or ingest workflow. It can identify badly formed JP2s as well as ensuring conformance to an organisation's JP2 profile.
  • Size of objects
    • Processing large objects, and indeed large numbers of large objects, increases the potential and impact of everyday IT issues on the resulting files. Network dropouts, disk errors or capacity issues, and software bugs can all lead to the creation of damaged files. As noted above, automated checking with Jpylyzer has advantages in cost, accuracy and coverage of quality checking. It is far more effective at detecting broken or truncated files, resulting from the issues mentioned, than manual checking.
  • Complexity of objects
    • The JPEG2000 standard offers a complex range of options for construction of a JP2 file. A range of compression types and levels are possible, and images can be optimised for remote delivery in a number of different ways. Conformance checking to an institutional profile, using Jpylyzer

Range of JP2 options, optimisation for access and compression choice. Must meet institutional profile.

  • Heterogeneity of collections
    • Deposited collections...? reiterate above? JP2 around for a long time before obvious issues ID'd.
    • JP2 was around for many years before it's more recent adoption by memory institutions for storing digitised masters and the fixing of a number of significant preservation issues. Jpylyzer is vital for identifying JP2s that exhibit these risks in deposited collections (i.e. data acquired by or deposited with libraries, archives and other organisations).

Jpylyzer implementation risks

This section provides a list of implementation risks for Jpylyzer and aligns with the DPBCT section on Implementation Risks.

  • Jpylyzer is comprehensive but does not parse/validate every aspect of a JP2
  • Jpylyzer has been developed primarily by a single developer, but is now supported by OPF and has seen production use
    • Jpylyzer is used in a production environment by a number of organisations and embedded in Goobi digitisation workflow tool
    • Jpylyzer is actively supported at time of writing. See Ohloh automated code analysis
  • Application of Jpylyzer does not mitigate all identified JP2 related preservation risks. Choosing JP2 for storing digitised masters is a trade off between cost, risk and quality
  • Over reliance on an automated quality assurance approach could be risky. Jpylyzer should be used in conjunction with other checking mechanisms (such as sampled manual checks)

Jpylyzer costs and suitability

  • Ease of use, simplicity of application, has been adopted
  • Moving to JP2 requires considerable expertise with the JP2 format, and familiarity with JP2 tools
  • Delivering content in JP2 format requires server side delivery capabilities

Jpylyzer case study

This case study applies the benefits and risks above to a specific (theoretical) organisational situation. It shows how the generic benefits in particular should be tailored to the needs of an organisation and the likely concerns and interests of stakeholders. It comprises key sections from the DPBCT Template for building a business case.

Executive Summary

Mass digitisation projects generate millions of master images that must be stored in multiple locations to ensure their longevity. At this scale, storage costs (even in the short term) are considerable. JPEG2000 technology offers the potential to significantly reduce the size of digitised masters for a negligible loss of quality. Using the JP2 file format to store digitised masters therefore provides an attractive alternative to the conventional choice of TIFF. There are however considerable digital preservation concerns about JP2, which could put longevity of digitised collections at risk. This business activity will put in place a quality assurance process that will validate JP2 masters, mitigate preservation risks associated with their usage, and as a result enable considerable storage cost savings. By reducing initial storage costs by around 80%, resources will be freed up for the digitisation of one million additional pages.

Explanation

The summary explains the context and current situation before describing the change to business processes that will be implemented. It focuses on the key elements for a business case of this kind: benefits, costs and risks. The text uses some technical language (mention of JP2 and TIFF) and this may be deemed too technical for the audience, in which case it may be better to refer to the technologies in general terms without specifically naming them.

Digital preservation risks

A number of a digital preservation risks associated with the JP2 format were identified by early adopters of this technology. Without the application of suitable mitigation there is considerable concern for the longevity of material stored in this format. Software applications that generate JP2's have shown some degree of unreliability. Processing large volumes of data can push generating software (and other workflow processes) to the edge. Results can be invalid or badly formed JP2 files, or possibly even truncated files. The JP2 format presents a vast array of options for the design of a particular JP2 file. JP2's can be optimised for delivery and access in a variety of ways and the type and level of lossy compression is crucial in ensuring appropriate file size and quality levels. These policy choices are defined in the organisational JP2 profile. If JP2's are generated that do not meet this profile, management of and access to the files may become problematic.

This image, courtesy of the British Library provides an example of an arbitrarily truncated JP2, created by a faulty workflow process at the British Library. This example was one of many used to test developments in Jpylyzer.

Explanation

A strong business case results from the use of references to real examples of relevant preservation risks. The use of images can be a powerful communication tool.

Business Activity

The existing digitisation workflow for mass digitisation projects at this organisation migrates camera raw images from the digitisation studio to TIFF files for storage as master images in the long term digital respository. The proposed business activity will instead migrate camera raw images to JP2, and then validate the construction of these files using Jpylyzer.

Explanation

Moving from TIFF to JP2 for the storage of digitised masters will not suit every collection or organisation. Benefits can include reduced storage cost (if lossy compression is utilised) and improved user experience (if images are optimised and delivered to browser by an appropriate system). However, these benefits introduce preservation risks (some which are not mitigated by Jpylyzer). Lossy compression lowers the quality of the stored images. The trade off between, cost and risk will be different in different situations. Making an appropriate choice for the collection and organisation in question is essential, and should be guided by collection needs, institutional policy, a careful evaluation of the alternatives, practical testing of software with real data, and a clear appreciation of the risks involved.

Benefits

Implementation of a quality assurance process for digitised masters, based on the application of Jpylyzer, will facilitate a move from TIFF to JP2 for the storage of digitised masters from mass digitisation (1million+ pages) projects. It will generate the following benefits:

  • A reduction in storage costs of 80%, enabling a further 1 million pages to be digitised
    • Supporting key organisational objective 3: "Significantly increase access to the collections by implementing digitisation of at least 5 million pages of our collection"
  • Mitigation of key long term digital preservation risks
    • Supporting key organisational objective 2: "Preserve and provide access to our collections for future generations of researchers"
  • Improved quality of digitisation results and improved efficiency of their generation via the use of cutting edge quality assurance technology
    • Supporting key organisational objective 5: "Deliver organisational change, through the use of new techniques and technology enhancing our reputation"

Explanation

This example benefits section distils a range of possible benefits down to three really important issues for this organisation, as shown by close alignment with the organisational objectives. Reduce costs/do more, ensure preservation, and improve quality/efficiency. The benefits in this case are pitched in the context of the wider operation of moving from the generation of TIFF to JP2 masters, rather than just the specifics of the preservation issues. Benefits tied primarily to long term preservation will often also be linked to quality and cost. Improved quality tends to result from better managed and validated processes that follow a sensible policy. Depending on the stakeholder(s) in question this might be viewed as more significant and immediate than preservation benefits, as hence be considered more favourably.

A reduction in costs could simply save money from the storage and preservation budget, or it could expressly be used to enable more digitisation (more realistic in practice if digitisation, preservation and initial storage are all initially covered by the same capital investment). This could be crucial in making a strong case to the most important stakeholders. In this example, freeing up resources to enable extra digitisation aligns well with a crucial organisational objective.

Implementation Risks

Jpylyzer is in production use with a number of organisations including the British Library and the Wellcome Library and has been incorporated in Goobi, the digitisation workflow management tool. It has been developed to tackle concrete preservation issues identified by organisations that include the early adopters listed above, and was refined over time to ensure files from these example cases were validated effectively. A freely available set of test files supports testing and benchmarking of Jpylyzer. Jpylyzer should not however be considered to be fool proof and it should be used in conjunction with other quality checking mechanisms.