Tools of the Trade — AI Reference Library

The data science and ML ecosystem has converged around a fairly stable set of tools. You don't need to master all of them — but knowing what they are, and why each exists, gives you a clear map of the landscape.

Python — the language of data science

Python has become the dominant language for data science and machine learning. It's not the fastest language, but its readable syntax and vast ecosystem of libraries make it the default choice across academia and industry.

If you learn one thing for AI work, make it Python. A few weeks of basics gives you access to everything else in this module.

SQL — the language of data

SQL (Structured Query Language) is how you query databases — extract, filter, join, and aggregate data. Every data professional uses SQL constantly. No matter how advanced your Python and ML skills become, you'll always need SQL to get data in the first place.

# SQL: find top customers by spend in the last 90 days

SELECT

  customer_id,

  SUM(order_total) AS total_spend

FROM orders

WHERE order_date >= DATE_SUB(NOW(), INTERVAL 90 DAY)

GROUP BY customer_id

ORDER BY total_spend DESC

LIMIT 10;

Key Python libraries

pandas

The Swiss Army knife of data manipulation. Load CSVs, clean data, filter rows, join tables, aggregate values — all with readable Python syntax. The first thing any data scientist imports.

NumPy

Fast numerical computing. The foundation of scientific Python — arrays, maths operations, linear algebra. Most other libraries build on NumPy under the hood.

Matplotlib / Seaborn

Data visualisation. Matplotlib is the low-level foundation; Seaborn builds on it with prettier statistical charts. For interactive dashboards, Plotly is increasingly popular.

scikit-learn

The standard ML library for classical algorithms. Decision trees, random forests, SVMs, regression, clustering, preprocessing — all with a consistent, beginner-friendly API.

PyTorch / TensorFlow

The two deep learning frameworks. PyTorch dominates research; TensorFlow (and its Keras interface) is widely used in production. Both are used for training neural networks.

Jupyter Notebooks

An interactive environment where you write code in cells, run them, and see results immediately — with visualisations, text, and code all in one document. The standard tool for data exploration and analysis.

# A typical data science workflow in Python

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv("data.csv")           # 1. load

df = df.dropna()                       # 2. clean

df["age_group"] = pd.cut(df["age"], bins=[0,18,35,60,100])  # 3. engineer features

df["age_group"].value_counts().plot(kind="bar")              # 4. visualise

X_train, X_test, y_train, y_test = train_test_split(df.drop("target", axis=1), df["target"])

model = RandomForestClassifier().fit(X_train, y_train)       # 5. model

print(model.score(X_test, y_test))                           # 6. evaluate

Cloud platforms

For large-scale work, cloud platforms provide managed services for storage, compute, and ML: Google Cloud (BigQuery, Vertex AI), AWS (S3, SageMaker), Azure (ML Studio). You don't need to start here — local Python is fine for learning — but cloud skills are increasingly expected in industry roles.

Key takeaways

Python is the dominant language for data science and ML — learn this first
SQL is essential for every data professional — always needed to access data
Core Python stack: pandas (data), NumPy (maths), Matplotlib (visualisation), scikit-learn (ML)
For deep learning: PyTorch (research) or TensorFlow/Keras (production)
Jupyter Notebooks are the standard environment for exploration and analysis