Skip to Main Content

RDM - Preserve & Store: Storage and Backup



By implementing effective backup and storage strategies, researchers can ensure the integrity, security, and accessibility of their data during the research project.

Data Repositories

A data repository is a centralized storage location where data from various sources is collected, managed, and maintained. It serves as a platform for storing and accessing data, ensuring consistency and ease of use.

Benefits of data repositories are summarized as follows:

Benefit Explanation
Centralization Consolidates data from multiple sources into a single location, simplifying access and management.
Data Integrity and Quality Ensures data consistency and accuracy through standardized processes. 
Enhanced Data Security Provides robust security measures to protect sensitive information.
Data Version Control Tracks changes and maintains historical versions of data.
Data Discovery and Reusability Facilitates easy discovery and reuse of data for various purposes. 
Regulatory Compliance Helps meet legal and regulatory requirements for data management.
Reduction in Data Redundancy Minimizes duplicate data, reducing storage costs and improving efficiency.
Scalability Supports growth by accommodating increasing volumes of data.

Renowned data repositories

Some renowned data repositories across various disciplines are listed below:

Platform General Information Cost
General and Multi-disciplinary
Zenodo Supported by CERN, Zenodo is a general-purpose open-access repository that allows researchers to share and preserve research outputs in any size, any format, and from any field. Free to use for both data deposit and access.
Figshare A repository where users can make all of their research outputs available in a citable, shareable, and discoverable manner. Free for individual researchers with limits on storage; institutional plans are available for a fee.
Dryad An open-source, research data curation and publication platform that makes data discoverable, freely reusable, and citable. Free to access data; data submission may incur a fee, often covered by institutions or journals.
Life Sciences
NCBI GenBank A comprehensive public database of nucleotide sequences and supporting bibliographic and biological annotation. Free to access and submit data.
EMBL-EBI (European Bioinformatics Institute) Offers a wide range of data resources for molecular biology, including genomics, proteomics, and metabolomics. Free to access and submit data.
Protein Data Bank (PDB) A repository for the 3D structural data of large biological molecules, such as proteins and nucleic acids. Free to access and submit data.
Earth and Environmental Sciences
PANGAEA Provides access to one of the most significant archives of atmospheric, coastal, oceanic, and geophysical data. Free to access data; data submission is generally free but may require collaboration with a data curator.
NOAA National Centers for Environmental Information (NCEI) Offers a wide range of data resources for molecular biology, including genomics, proteomics, and metabolomics. Free to access data; some specialized data services may incur a fee.

Backup Strategies

Strategy Details
3-2-1 Rule
  • Keep at least 3 copies of your data
  • Save the copies on minimum 2 different storage type
  • Keep 1 copy off-site
Regular Backups Implementing a regular backup schedule is crucial to ensure that data is consistently protected. For example, you can define the documents with daily, weekly, or monthly backups depending on the frequency of data changes.
Multiple Copies Maintaining multiple copies of data in different locations (e.g., on-site and off-site) helps protect against data loss due to hardware failure, natural disasters, or other unforeseen events. To put data to both a local server and a cloud storage service, and perform checksum validation regularly to check data integrity before and after the update.
Automated Backups Automating the backup process reduces the risk of human error and ensures that backups are performed consistently and reliably. Regularly testing the backup systems and processes can ensure that data can be successfully restored when needed.

File Format Considerations

When backing up research data, it's important to use formats that ensure long-term accessibility and usability. Here are some recommended formats based on the type of data.

Data Type Recommended Format Other Acceptable Format
Textual Data
  • Plain text (.txt)
  • XML (.xml)
  • PDF/A (.pdf)
Examples
  • Markdown (.md)
  • HTML (.html)
  • Microsoft Word (.doc, .docx)
Tabular Data (including spreadsheets) Comma-separated values (.csv)
Examples
  • OpenOffice (.ods)
  • Microsoft Excel (.xls, .xlsx)
Images TIFF (.tiff) - uncompressed
Examples
  • PNG (.png)
  • JPEG (.jpg)
  • GIF (.gif)
  • PDF/A (.pdf)
  • Standard applicable RAW image format (.raw)
  • Photoshop files (.psd)
Audio FLAC (.flac)
Examples
  • MP3 (.mp3)
  • RealAudio (.ra)
  • Windows Media Audio (.wma)
  • WAV (.wav)
Video
  • MPEG-4 (.mp4)
  • OGG video (.ogv, .ogg)
  • motion JPEG 2000 (.mj2)
Examples
  • QuickTime (.mov)
  • AVI (.avi)
  • RealVideo (.rv)
  • Windows Media Video (WMV) (.wmv)
Databases
  • XML (.xml)
  • CSV (.csv)
Examples
  • MySQL database backup
  • Proprietary database formats, e.g., SPSS
  • Portable format (.por) and MS Access (.mdb/.accdb)

Storage Media Options

Each storage media option has its own strengths and weaknesses, making them suitable for different use cases. Consider your specific needs, such as capacity, speed, durability, and cost, when choosing the right storage solution.

 
Storage Options Advantages Risks
Desktop/PCs/Laptops
  • Convenient
  • Cost-effective
  • Supports various file types and formats
  • Fast data access and processing<
  • Risk of data loss due to physical damage/Loss
  • Limited Storage Capacity
  • No auto backup
External Storage (e.g. USB Flash Drives)
  • Portable and convenient
  • Fast access speeds
  • No need for internet access
  • Risk of data loss due to physical damage/Loss
  • Limited Storage Capacity
  • No auto backup
Cloud Storage
  • Accessible from anywhere
  • Scalable
  • Managed by service providers (less maintenance)
  • Ongoing subscription costs
  • Dependent on internet access
  • Potential security and privacy concerns
University's Network Drive (SharePoint)
  • Centralized storage and easy collaboration
  • Robust security and access controls
  • Automatic backups and version control
  • Dependent on internet access
  • Limited by university's storage policies

  For enquiries, please contact the Library's Research Data Management Services of the Research Support and Scholarly Communication Section at lbrdms@cityu.edu.hk