By implementing effective backup and storage strategies, researchers can ensure the integrity, security, and accessibility of their data during the research project.
Data Repositories
A data repository is a centralized storage location where data from various sources is collected, managed, and maintained. It serves as a platform for storing and accessing data, ensuring consistency and ease of use.
Benefits of data repositories are summarized as follows:
Benefit | Explanation |
Centralization | Consolidates data from multiple sources into a single location, simplifying access and management. |
Data Integrity and Quality | Ensures data consistency and accuracy through standardized processes. |
Enhanced Data Security | Provides robust security measures to protect sensitive information. |
Data Version Control | Tracks changes and maintains historical versions of data. |
Data Discovery and Reusability | Facilitates easy discovery and reuse of data for various purposes. |
Regulatory Compliance | Helps meet legal and regulatory requirements for data management. |
Reduction in Data Redundancy | Minimizes duplicate data, reducing storage costs and improving efficiency. |
Scalability | Supports growth by accommodating increasing volumes of data. |
Renowned data repositories
Some renowned data repositories across various disciplines are listed below:
Platform | General Information | Cost |
General and Multi-disciplinary | ||
Zenodo | Supported by CERN, Zenodo is a general-purpose open-access repository that allows researchers to share and preserve research outputs in any size, any format, and from any field. | Free to use for both data deposit and access. |
Figshare | A repository where users can make all of their research outputs available in a citable, shareable, and discoverable manner. | Free for individual researchers with limits on storage; institutional plans are available for a fee. |
Dryad | An open-source, research data curation and publication platform that makes data discoverable, freely reusable, and citable. | Free to access data; data submission may incur a fee, often covered by institutions or journals. |
Life Sciences | ||
NCBI GenBank | A comprehensive public database of nucleotide sequences and supporting bibliographic and biological annotation. | Free to access and submit data. |
EMBL-EBI (European Bioinformatics Institute) | Offers a wide range of data resources for molecular biology, including genomics, proteomics, and metabolomics. | Free to access and submit data. |
Protein Data Bank (PDB) | A repository for the 3D structural data of large biological molecules, such as proteins and nucleic acids. | Free to access and submit data. |
Earth and Environmental Sciences | ||
PANGAEA | Provides access to one of the most significant archives of atmospheric, coastal, oceanic, and geophysical data. | Free to access data; data submission is generally free but may require collaboration with a data curator. |
NOAA National Centers for Environmental Information (NCEI) | Offers a wide range of data resources for molecular biology, including genomics, proteomics, and metabolomics. | Free to access data; some specialized data services may incur a fee. |
Backup Strategies
Strategy | Details |
3-2-1 Rule |
|
Regular Backups | Implementing a regular backup schedule is crucial to ensure that data is consistently protected. For example, you can define the documents with daily, weekly, or monthly backups depending on the frequency of data changes. |
Multiple Copies | Maintaining multiple copies of data in different locations (e.g., on-site and off-site) helps protect against data loss due to hardware failure, natural disasters, or other unforeseen events. To put data to both a local server and a cloud storage service, and perform checksum validation regularly to check data integrity before and after the update. |
Automated Backups | Automating the backup process reduces the risk of human error and ensures that backups are performed consistently and reliably. Regularly testing the backup systems and processes can ensure that data can be successfully restored when needed. |
File Format Considerations
When backing up research data, it's important to use formats that ensure long-term accessibility and usability. Here are some recommended formats based on the type of data.
Data Type | Recommended Format | Other Acceptable Format |
Textual Data |
|
Examples
|
Tabular Data (including spreadsheets) | Comma-separated values (.csv) |
Examples
|
Images | TIFF (.tiff) - uncompressed |
Examples
|
Audio | FLAC (.flac) |
Examples
|
Video |
|
Examples
|
Databases |
|
Examples
|
Storage Media Options
Each storage media option has its own strengths and weaknesses, making them suitable for different use cases. Consider your specific needs, such as capacity, speed, durability, and cost, when choosing the right storage solution.
![]() |
![]() |
|
Storage Options | Advantages | Risks |
Desktop/PCs/Laptops |
|
|
External Storage (e.g. USB Flash Drives) |
|
|
Cloud Storage |
|
|
University's Network Drive (SharePoint) |
|
|
For enquiries, please contact the Library's Research Data Management Services of the Research Support and Scholarly Communication Section at lbrdms@cityu.edu.hk