Data Vault: Difference between revisions

From wiki.dpconline.org
Jump to navigation Jump to search
No edit summary
No edit summary
Line 5: Line 5:




The project’s “Problem Statement” reads: “As part of typical suites of Research Data Management services, researchers are provided with large allocations of ‘active data store’.  This is often stored on expensive and fast disks to enable efficient transfer and working with large amounts of data.  However, over time this active data store fills up, and researchers need a facility to move older but valuable data to cheaper storage for long term care.  In addition, research funders are increasingly requiring data to be stored in forms that allow it to be described and retrieved in the future.  The Data Vault concept will fulfil these requirements for the rest of the data that isn’t publicly shared via an open data repository.”
The project’s 'Problem Statement' reads: “As part of typical suites of Research Data Management services, researchers are provided with large allocations of ‘active data store’.  This is often stored on expensive and fast disks to enable efficient transfer and working with large amounts of data.  However, over time this active data store fills up, and researchers need a facility to move older but valuable data to cheaper storage for long term care.  In addition, research funders are increasingly requiring data to be stored in forms that allow it to be described and retrieved in the future.  The Data Vault concept will fulfil these requirements for the rest of the data that isn’t publicly shared via an open data repository.”


''From the [https://docs.google.com/document/d/1k2XHlNBGR7sM6XBfyICIeGguoJc5uP3JJwJhrtgRvhI Project Plan]''
''From the [https://docs.google.com/document/d/1k2XHlNBGR7sM6XBfyICIeGguoJc5uP3JJwJhrtgRvhI Project Plan]''




The project will allow researchers to safely archive their research data from to predefined storage locations that include cloud and local storage (e.g. Arkivum, tape backup or AWS Glacier). It is designed to bridge the gap between the variety of storage options and the end user, while capturing metadata to allow it’s the search and re-use of the data. The system has two components: a Data Vault broker which transfers the data from local storage to archive and includes policy, integrity and security. The second is the Data Vault user interface which passes messages to the broker to start archival or retrieval tasks. Data is passed via a REST API.
"The project will allow researchers to safely archive their research data from to predefined storage locations that include cloud and local storage (e.g. Arkivum, tape backup or AWS Glacier). It is designed to bridge the gap between the variety of storage options and the end user, while capturing metadata to allow it’s the search and re-use of the data. The system has two components: a Data Vault broker which transfers the data from local storage to archive and includes policy, integrity and security. The second is the Data Vault user interface which passes messages to the broker to start archival or retrieval tasks. Data is passed via a REST API.
Key project outputs:
Key project outputs:
#DataVault software available on GitHub as open source
#DataVault software available on GitHub as open source
Line 16: Line 16:
#Phase 1: Working system - single user to vault data
#Phase 1: Working system - single user to vault data
#Phase 2: Additional features including (users, administration dashboard, extra filestore connectors (SFTP, Amazon Glacier, DropBox), user and group management)
#Phase 2: Additional features including (users, administration dashboard, extra filestore connectors (SFTP, Amazon Glacier, DropBox), user and group management)
#Storage, workflows, metadata and system requirements assessed and documented
#Storage, workflows, metadata and system requirements assessed and documented"


''From [https://docs.google.com/document/d/1hE3IXKbR78bIO4fHeGh4ugC8fxO-23koov3wnLHpqBw Spotlight Data: Jisc RDS Software Projects]''
''From [https://docs.google.com/document/d/1hE3IXKbR78bIO4fHeGh4ugC8fxO-23koov3wnLHpqBw Spotlight Data: Jisc RDS Software Projects]''
Line 22: Line 22:


“The “DataVault” project at the Universities of Edinburgh and Manchester is primarily addressing the Archival Storage entity of the OAIS model. … The DataVault whilst primarily being a storage facility will also carry out other digital preservation functionality. Data will be packaged using the BagIt specification, an initial stab at file identification will be carried out using Apache Tika and fixity checks will be run periodically to monitor the file store and ensure files remain unchanged. The project team have highlighted the fact that file identification is problematic in the sphere of research data as you work with so many data types across disciplines.  This is certainly a concern that the “Filling the Digital Preservation Gap” project has shared.”
“The “DataVault” project at the Universities of Edinburgh and Manchester is primarily addressing the Archival Storage entity of the OAIS model. … The DataVault whilst primarily being a storage facility will also carry out other digital preservation functionality. Data will be packaged using the BagIt specification, an initial stab at file identification will be carried out using Apache Tika and fixity checks will be run periodically to monitor the file store and ensure files remain unchanged. The project team have highlighted the fact that file identification is problematic in the sphere of research data as you work with so many data types across disciplines.  This is certainly a concern that the “Filling the Digital Preservation Gap” project has shared.”
From an synthesis of the projects in the context of the OAIS model, by Jen Mitcham of the Filling the Digital Preservation Gap project
 
''From a [http://digital-archiving.blogspot.co.uk/2015/12/the-research-data-spring-projects.html synthesis of the projects in the context of the OAIS model], by Jen Mitcham of the Filling the Digital Preservation Gap project''

Revision as of 17:53, 5 January 2016

Summary

DataVault aims to “to define and develop a Data Vault software system that will allow data creators to describe and store their data safely in one of the growing number of options for archival storage”.

From the project website: [1]


The project’s 'Problem Statement' reads: “As part of typical suites of Research Data Management services, researchers are provided with large allocations of ‘active data store’. This is often stored on expensive and fast disks to enable efficient transfer and working with large amounts of data. However, over time this active data store fills up, and researchers need a facility to move older but valuable data to cheaper storage for long term care. In addition, research funders are increasingly requiring data to be stored in forms that allow it to be described and retrieved in the future. The Data Vault concept will fulfil these requirements for the rest of the data that isn’t publicly shared via an open data repository.”

From the Project Plan


"The project will allow researchers to safely archive their research data from to predefined storage locations that include cloud and local storage (e.g. Arkivum, tape backup or AWS Glacier). It is designed to bridge the gap between the variety of storage options and the end user, while capturing metadata to allow it’s the search and re-use of the data. The system has two components: a Data Vault broker which transfers the data from local storage to archive and includes policy, integrity and security. The second is the Data Vault user interface which passes messages to the broker to start archival or retrieval tasks. Data is passed via a REST API. Key project outputs:

  1. DataVault software available on GitHub as open source
  2. DataVault demonstrators
  3. Phase 1: Working system - single user to vault data
  4. Phase 2: Additional features including (users, administration dashboard, extra filestore connectors (SFTP, Amazon Glacier, DropBox), user and group management)
  5. Storage, workflows, metadata and system requirements assessed and documented"

From Spotlight Data: Jisc RDS Software Projects


“The “DataVault” project at the Universities of Edinburgh and Manchester is primarily addressing the Archival Storage entity of the OAIS model. … The DataVault whilst primarily being a storage facility will also carry out other digital preservation functionality. Data will be packaged using the BagIt specification, an initial stab at file identification will be carried out using Apache Tika and fixity checks will be run periodically to monitor the file store and ensure files remain unchanged. The project team have highlighted the fact that file identification is problematic in the sphere of research data as you work with so many data types across disciplines. This is certainly a concern that the “Filling the Digital Preservation Gap” project has shared.”

From a synthesis of the projects in the context of the OAIS model, by Jen Mitcham of the Filling the Digital Preservation Gap project