Raw data is potential. Insight is what happens when you transform that potential into something you can act on. This module covers the journey from a messy dataset to a clear understanding — the core workflow of data analysis and a prerequisite for building effective AI systems.

Step 1 — Explore (EDA)

Exploratory Data Analysis (EDA) is the first thing any good data analyst or ML engineer does with a new dataset. Before modelling anything, you need to understand what you're working with.

# Python: first look at a dataset using pandas
import pandas as pd

df = pd.read_csv("sales_data.csv")
df.head() # see the first 5 rows
df.info() # column names, types, missing values
df.describe() # count, mean, std, min, max for each column

EDA reveals problems you'd otherwise miss: columns with missing values, dates stored as text, categorical variables with unexpected values, or numerical columns with suspicious outliers.

Step 2 — Clean

Real-world data is almost never clean. Common problems include:

# Handling missing values
df.dropna() # remove rows with any missing value
df["age"].fillna(df["age"].mean()) # fill with column mean
df.drop_duplicates() # remove duplicate rows
df["country"].str.strip().str.upper()# standardise text

Data cleaning typically takes 60–80% of the time in a real data project. It's unglamorous but essential. Uncleaned data fed into an ML model will produce unreliable, often misleading results.

Step 3 — Visualise

Humans are visual creatures. A chart can reveal patterns that tables of numbers completely hide. Visualisation is both an exploration tool (finding patterns) and a communication tool (explaining findings to others).

Common chart types and when to use them:

Chart typeBest for
HistogramDistribution of a single numerical variable
Scatter plotRelationship between two numerical variables
Bar chartComparing categories
Line chartTrends over time
HeatmapCorrelations between many variables at once
Box plotDistribution + outliers for a variable

Step 4 — Analyse and model

With clean data and a visual understanding of its shape, you're ready to extract insights. This might mean running a statistical test to confirm a relationship, building a predictive model, or segmenting customers into groups.

The key question at this stage: What decision will this insight drive? Analysis without a clear question or decision in mind often leads nowhere useful.

Example workflow

An e-commerce company notices conversion rates are dropping. EDA reveals the drop is concentrated on mobile devices. Cleaning removes bot traffic. Visualisation shows the drop started exactly when a new checkout page was launched. Analysis confirms mobile checkout time increased 40%. Insight: the new checkout page is broken on mobile. Decision: roll back or fix it.

Step 5 — Communicate

Insight that isn't communicated clearly has no value. The best data analysts are also good storytellers — they can translate complex findings into simple narratives that non-technical stakeholders can understand and act on.

Good data communication means: leading with the conclusion, not the methodology; using the right chart for the audience; and being honest about uncertainty and limitations.

For AI specifically

In machine learning, "data to insight" happens before model building (understanding your data), during training (monitoring learning curves and metrics), and after deployment (evaluating real-world performance). It's not a one-time step — it's an ongoing discipline.

Key takeaways

  • EDA — explore your data before building anything to understand its shape and problems
  • Data cleaning takes 60–80% of project time — it's essential, not optional
  • Visualisation reveals patterns that numbers alone hide
  • Analysis should always be tied to a clear decision or question
  • Communicating findings clearly is as important as finding them