Skip to main content

Research Data Management

Start of Research

It is always a good practice to have a DMP at the beginning of your research project and keep it a living document to reflect the change of your research plan. It helps reduce data management issues during your research. 

Besides, organising data into files and file directories and keeping the file organisation clear and descriptive is also important in data management.

File Name Conventions

File names should give people a meaningful context for the named files and people should be able to identify and distinguish similar files from one another. Here are some tips.

  • Date should be formatted in the following way (i.e. ISO 8601): YYYYMMDD or YYMMDD
  • File name length shouldn’t be too long as it becomes incompatible with all software types --- leave to 32 characters maximum
  • Avoid special characters usage in file names like: ! @ $ % * () ‘;<>,[]{}”
  • When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001
  • Avoid using spaces in file names; instead, use underscores (e.g. file_name), no separation (e.g. filename), dashes (e.g. file-name), or camel case (e.g. FileName)

It is also a good idea to design a "README.TXT" file that explains your naming convention and abbreviations.

File Formats

Try to select non-proprietary and uncompressed formats for the purpose of long-term storage and management. Here below are some preferred file formats.

  • Text: XML, PDF/A, HTML, ASCII, UTF-8 (not Word)
  • Tabular Data: CSV (not Excel)
  • Still Images: TIFF, JPEG 2000, PDF, PNG, BMP (not GIF or JPG)
  • Moving Images: MOV, MPEG, AVI, MXF (not Quicktime)
  • Sounds: WAVE, AIFF, MP3, MXF
  • Databases: XML, CSV
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Containers: TAR, GZIP, ZIP
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Web Archive: WARC


Reference:
http://libguides.libraries.wsu.edu/rdmlibguide/dataorganization

During Research

Data Storage & Backup

In the course of your research, data storage and backup are playing an important role for data access and meantime to avoid data loss. It is normally suggested to make 3 copies of the data at a regular frequency with copies being geographically distributed. Below are some storage options you may choose.

  • Desktop computers and laptops
  • External hard drives
  • Networked drives (e.g. the institutional network or networked drive)
  • Cloud storage (e.g. Google Drive or Dropbox)
  • Optical storage (e.g. CDs or DVDs)

Sensitive and Confidential Data

Data concerning human subjects should always be handled with caution. Individuals who participate in a research project should be well-informed about the purpose of the study, how the results will be used, and the likely consequences through signing a consent form. With regards to data sharing, identity anonymisation such as disguising subject's name, identifier, voice or facial features could be employed as far as possible. 

Data pertaining to patent applications or agreement with commercial partners could likewise require a certain level of confidentiality. Sharing such data can be ethical if informed consent, embargo period, or access restrictions is applied. In such cases, a metadata record available for citation will exist for public access but the data itself will remain inaccessible until the embargo has expired.

 

Reference:
http://libguides.libraries.wsu.edu/rdmlibguide/research

https://researchdata.ox.ac.uk/home/sharing-your-data/to-share-or-not-to-share/

End of Research

Data Repository

Depositing data in a repository is one solution to preserve, publish and share data. Data can be deposited with specialised subject-specific data repositories, multi-disciplinary repositories, publisher-hosted repositories, or institutional data archives, such as CityU Scholars. Below is a list of repositories you may consider to use.

Data Citation

Most repositories will issue a DOI or other permanent identifier that can be cited to give the data owner credit, allow the data to be re-used and the usage to be tracked.  

The referencing style for citing data is not universal and varies across disciplines. There are some elements that are commonly used in a data citation.

  • Creator(s)/Author(s)
  • Title
  • Publication/Release date – the date of the dataset’s release
  • Publisher – the archive, repository, or data centre
  • Identifier – the “unique public identifier” (e.g. DOI, Persistent Identifier etc.)

Here is a data citation example: Creator (PublicationYear). Title. Publisher. Identifier

 

References:
https://researchdata.ox.ac.uk/home/what-do-i-need-to-do-now/i-am-about-to-publish-some-of-my-findings/
http://libguides.libraries.wsu.edu/rdmlibguide/datasharing

https://datacite.org/cite-your-data.html