Skip to Main Content

Data Science

What are Datasets?

Datasets are collections of raw data gathered during the research process, usually in the form of numerical data. Many organizations, such as government agencies, universities, or research institutions, make the data they have collected freely available on the web for other researchers to use.

Note: Data is the raw information from which statistics are created. Statistics give an interpretation of the data.

How to Identify Relevant Datasets?

To identify relevant datasets for use in your research, you can:

  • Search for articles in CityUHK LibraryFind using your topic keywords and include the terms dataset OR "data set" in the search. Alternatively, search in CityUHK LibraryFind using your topic keywords and filter the results by Datasets (under Resource Type).
  • Search the website or publications of an organization or government department that collects the type of data that you need.
  • Try searching through an extensive data archive. 

Finding Datasets & Data Sources

Government & Public Data

 

General Purpose & Popular Repositories

 

Academic & Research

  • DataCite Commons
    DataCite provides persistent identifiers (DOIs) for research data and other research outputs, making them more discoverable.
  • Dryad
    Dryad is an open data publishing platform and a community committed to the open availability and routine re-use of all research data.
  • Figshare 
    Research outputs including datasets, figures, and code.
  • Harvard Dataverse 
    Research datasets across disciplines.
  • Mendeley Data
    Search millions of datasets from domain-specific and cross-domain repositories.
  • OpenML
    It is an open platform for sharing datasets, code, and experiments for machine learning research.
  • re3data.org  
    A global registry of research data repositories across disciplines.
  • Zenodo
    An open repository for EU-funded research outputs. Hosts datasets, software, and reports.

 

Domain-Specific Sources

Economics & Finance

Health & Biology

Geography & Climate

Social Sciences

Text & Natural Language Processing (NLP), & Machine Learning

 

APIs & Live Data