Difference between revisions of "ScapePlatformBusinessActivity"

From wiki.dpconline.org
Jump to navigation Jump to search
(Created page with "==SCAPE Platform Business Activity== Digital collections at this organisation have grown to a sufficient size that assessment of the data for preservation risks has becoming v...")
 
Line 11: Line 11:
====Discussion====
====Discussion====
The organisation is struggling to assess it's digital collections due to the length of time it takes to assess the growing numbers of files. Without any file format analysis, the potential for unknown preservation risks is significant. The business case therefore suggests leveraging (and expanding) an existing Hadoop cluster for running preservation processes. Web archiving departments have been early adopters of Hadoop technology in libraries and archives, and so this scenario has been used by some SCAPE Project partners.
The organisation is struggling to assess it's digital collections due to the length of time it takes to assess the growing numbers of files. Without any file format analysis, the potential for unknown preservation risks is significant. The business case therefore suggests leveraging (and expanding) an existing Hadoop cluster for running preservation processes. Web archiving departments have been early adopters of Hadoop technology in libraries and archives, and so this scenario has been used by some SCAPE Project partners.
''For more on developing and articulating your business activity see the DPBCT sections on [[Business activity]], [[How do I make the case for what I want to do?]] and [[Why are we writing a business case?]].''

Revision as of 15:49, 2 May 2014

SCAPE Platform Business Activity

Digital collections at this organisation have grown to a sufficient size that assessment of the data for preservation risks has becoming virtually impossible. Execution time of a simple file format identification process is taking several months to complete. The proposed business activity will expand the web archiving department's Hadoop cluster, connect the organisation's digital repository to the cluster and apply the SCAPE Platform to enable all repository data to be characterised and assessed on a more frequent and practical basis.

The main activities are:

  • Purchase new hardware and expand existing cluster
  • Implement SCAPE Platform connector to enable movement of content from the digital repository to the cluster
  • Implement SCAPE Platform tools, specifically Nanite, to enable file format identification, metadata extraction and characterisation of collection data
  • Perform frequent re-assessment of data using the latest tools and file format signatures available

The web archiving department have agreed to allow digital preservation to utilise their existing Hadoop cluster in return for expanding the cluster with some additional hardware.

Discussion

The organisation is struggling to assess it's digital collections due to the length of time it takes to assess the growing numbers of files. Without any file format analysis, the potential for unknown preservation risks is significant. The business case therefore suggests leveraging (and expanding) an existing Hadoop cluster for running preservation processes. Web archiving departments have been early adopters of Hadoop technology in libraries and archives, and so this scenario has been used by some SCAPE Project partners.