Supervised Learning — AI Reference Library

Supervised learning is the most widely used form of machine learning. The idea is simple: you teach the model using examples where you already know the right answer — labelled data. The model learns to map inputs to outputs, then applies that learning to new, unseen examples.

The core idea

Supervised learning is like teaching with an answer key. You show the model thousands of examples with correct answers. It learns the patterns. Then it applies those patterns to examples it's never seen before.

Classification vs Regression

Supervised learning problems fall into two categories:

Classification — predicting a category. Is this email spam or not? Is this tumour malignant or benign? Which digit (0–9) is in this image?
Regression — predicting a continuous number. What will this house sell for? What will the temperature be tomorrow? How many units will we sell next quarter?

Real-world examples

Classification: Gmail spam detection, face recognition, medical diagnosis, sentiment analysis (positive/negative review)

Regression: House price prediction, weather forecasting, stock price modelling, demand forecasting

How supervised learning works — step by step

Collect labelled data — gather examples where you know the correct answer
Split into train/test sets — typically 80% training, 20% testing
Choose a model — decision tree, neural network, logistic regression, etc.
Train — the model iteratively adjusts to minimise its errors on training data
Evaluate — test performance on the held-out test set
Deploy — use the model on new, real-world data

# Classification example: predict if a tumour is malignant

from sklearn.datasets import load_breast_cancer

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LogisticRegression(max_iter=10000)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print(f"Accuracy: {accuracy_score(y_test, predictions):.2%}")

The importance of good labels

The quality of a supervised learning model is limited by the quality of its labels. Inconsistent, biased, or incorrect labels produce poor models — no matter how sophisticated the algorithm. This is why data labelling (and the humans who do it) is so critical and so expensive.

Unsupervised learning — when you don't have labels

Not all data comes with correct answers. Unsupervised learning finds patterns without labels — grouping similar customers together (clustering), detecting unusual transactions (anomaly detection), or compressing data into fewer dimensions. It's harder to evaluate but powerful when labelled data is scarce.

Key takeaways

Supervised learning uses labelled data — examples with known correct answers
Classification predicts categories; regression predicts continuous numbers
Training: show examples → model adjusts → evaluate on unseen test data
Label quality directly determines model quality — garbage labels = garbage model
Unsupervised learning finds patterns without labels — clustering, anomaly detection