Research Guides: Research Data Management: Best Practices

Prepare a Data Management Plan (DMP)

It is always a good practice to have a Data Management Plan (DMP) at the beginning of your research project and keep it a living document to reflect the change of your research plan. It helps reduce data management issues during your research.

Besides, organising data into files and file directories and keeping the file organisation clear and descriptive is also important in data management.

File Name Conventions

File names should give people a meaningful context for the named files and people should be able to identify and distinguish similar files from one another. Here are some tips.

Date should be formatted in the following way (i.e. ISO 8601): YYYYMMDD or YYMMDD
File name length shouldn’t be too long as it becomes incompatible with all software types --- leave to 32 characters maximum
Avoid special characters usage in file names like: ! @ $ % * () ‘;<>,[]{}”
When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001
Avoid using spaces in file names; instead, use underscores (e.g. file_name), no separation (e.g. filename), dashes (e.g. file-name), or camel case (e.g. FileName)

It is also a good idea to design a "README.TXT" file that explains your naming convention and abbreviations.

File Formats

Try to select non-proprietary and uncompressed formats for the purpose of long-term storage and management. Here below are some preferred file formats.

Text: XML, PDF/A, HTML, ASCII, UTF-8 (not Word)
Tabular Data: CSV (not Excel)
Still Images: TIFF, JPEG 2000, PDF, PNG, BMP (not GIF or JPG)
Moving Images: MOV, MPEG, AVI, MXF (not Quicktime)
Sounds: WAVE, AIFF, MP3, MXF
Databases: XML, CSV
Statistics: ASCII, DTA, POR, SAS, SAV
Containers: TAR, GZIP, ZIP
Geospatial: SHP, DBF, GeoTIFF, NetCDF
Web Archive: WARC

Data Storage & Backup

In the course of your research, data storage and backup are playing an important role for data access and meantime to avoid data loss. It is normally suggested to make 3 copies of the data at a regular frequency with copies being geographically distributed. Below are some storage options you may choose.

Desktop computers and laptops
External hard drives
Networked drives (e.g. the institutional network or networked drive)
Cloud storage (e.g. Google Drive or Dropbox)
Optical storage (e.g. CDs or DVDs)

Sensitive and Confidential Data

Data concerning human subjects should always be handled with caution. Individuals who participate in a research project should be well-informed about the purpose of the study, how the results will be used, and the likely consequences through signing a consent form. With regards to data sharing, identity anonymisation such as disguising subject's name, identifier, voice or facial features could be employed as far as possible.

Data pertaining to patent applications or agreement with commercial partners could likewise require a certain level of confidentiality. Sharing such data can be ethical if informed consent, embargo period, or access restrictions is applied. In such cases, a metadata record available for citation will exist for public access but the data itself will remain inaccessible until the embargo has expired.

Data Repository

Depositing data in a repository is one solution to preserve, publish and share data. Data can be deposited with specialised subject-specific data repositories, multi-disciplinary repositories, publisher-hosted repositories, or institutional data archives. Below is a list of repositories you may consider to use.

Registry of data repositories
→ re3data.org
→ OpenDOAR
→ Repository Finder
Multi-disciplinary data repositories
→ Mendeley Data
→ Figshare
→ Dryad
List of disciplinary repositories
→ Data repositories by Open Access Directory
→ Recommended data repositories by Nature

Data Citation

Most repositories will issue a DOI or other permanent identifier that can be cited to give the data owner credit, allow the data to be re-used and the usage to be tracked.

The referencing style for citing data is not universal and varies across disciplines. There are some elements that are commonly used in a data citation.

Creator(s)/Author(s)
Title
Publication/Release date – the date of the dataset’s release
Publisher – the archive, repository, or data centre
Identifier – the “unique public identifier” (e.g. DOI, Persistent Identifier etc.)

View more on Citing Data page.