Skip to Main Content

RDM - Collect & Organize: Data Processing



After collecting the raw data, researchers will perform several key processes to ensure that the data is accurate, reliable, and appropriate for analysis. Below is a comprehensive guide for data processing from data cleaning, transformation, integration, to data analysis, visualization, and interpretation.

The 6 Key Processes

1. Data Cleansing

Data cleansing, also known as data cleaning or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality and reliability.

The process involves removing duplicate records, filling in missing values, correcting corrupted and erroneous data entries, and standardizing data formats. 

A video created by NetCom Learning (1:50)

Topics include:

  • Introduction to Data Cleansing
  • Phases of Data Cleansing/Clearing

2. Data Transformation

Through this process, data will be converted into a suitable format or structure for analysis. This may involve data normalization/  deduplication/  aggregation/  discretization, creating new variables, or transforming data types.

Transformation helps in preparing data into a target format that can be fed into operational systems, a data lake, a data warehouse, or other repositories for use in business intelligence and analytics applications.

A video created by the University of Liverpool (3:41)

Topics include:

  • Different types of data transformation
  • Examples of some tools to get you started

3. Data Integration

The process of combining and harmonizing data from multiple sources into a unified and coherent format. Ultimately, the data from various systems and databases will be transformed into a consistent structure and become accessible for analysis and decision making. Some common data integration approaches are ELT (Extract, Load, and Transform), Real-time Data Integration, Application Integration (API), Data Virtualization, etc.

A video created by Qlik (2:04)

Topics include:

  • Definition of data integration
  • Modern approaches of data integration

4. Data Analysis

The process of systematically and comprehensively applying statistical and logical techniques to evaluate and interpret data, uncovering patterns, trends, and relationships. It involves cleaning, transforming, and modeling data to extract meaningful insights, predict trends, and support decision-making from both structured and unstructured data. Data analysis can be qualitative or quantitative, depending on the nature of the data and research objectives.

A video created by DataWrangler (2:46)

Topics include:

  • Basic understanding of data analytics
  • Explains how data driven approach is helping business make right decisions with examples

5. Data Visualization

The process of using essential tool to present data in a visually appealing and understandable manner, helping in conveying complex information and insights to stakeholders effectively. The commonly adopted graphics for the data representation are charts, plots, infographics, heat maps, and even animations.

There are two primary classes of visualization:

Information Visualization:

  • Exploratory: to explore and understand patterns and trends within the data
  • Explanatory: to communicate with the audience

Scientific Visualization:

  • Involves the visualization of data with an inherent spatial component. This can be the visualization of scalar, vector, and tensor fields. Common areas of scientific visualization include computational fluid dynamics, medical imaging and analysis and weather data analysis.

A video created by the University of British Columbia (2:40)

Topics include:

  • Concepts of data visualization
  • Examples of how data visualization is done
  • Examples of some online tools to get started

6. Data Interpretation

Data interpretation involves analyzing collected data to draw meaningful conclusions. This process includes identifying patterns, trends, and relationships within the data. Researchers use statistical methods, visualizations, and contextual knowledge to understand the data's implications.

Effective data interpretation helps validate hypotheses, inform decision-making, and generate insights that contribute to research objectives.  

A video created by the University of Melbourne (8:24)

Topics include:

  • Suggestions on how to summarize, manipulate, and format data
  • Examples of how to analyze quantitative and qualitative data 
  • Additional tips to help you engage effectively with data 

 

  For enquiries, please contact the Library's Research Data Management Services of the Research Support and Scholarly Communication Section at lbrdms@cityu.edu.hk