The four ways machines learn
Four ML approaches: supervised (labelled), unsupervised (unlabelled), semi-supervised (small labelled + large unlabelled) and reinforcement (agent learns by trial and error). LLMs rely on semi-supervised learning is a recurring fact.
AI models must be taught before they can solve anything. Machine learning is that teaching process. Four approaches, one comparison table, then the detail the exam digs into.
| Approach | Training data | Goal | Examples | Watch-outs |
|---|---|---|---|---|
| Labelled data, inputs mapped to known targets | Predict outputs for new, unseen data | Spam detection · fraud flagging · labelled road signs | Labelled data is costly and slow · labelling can introduce bias | |
| Unlabelled data, no predefined targets | Find hidden patterns, structures, relationships | Customer segmentation · anomaly detection · genetics, fault detection, marketing | Cheaper but less accurate, unpredictable, interpretation is subjective | |
| Small labelled + large unlabelled set | Best of both worlds, cuts manual labelling cost | Speech recognition · LLMs often rely on it · ChatGPT, DALL-E | Label quality and consistency · choosing an algorithm that handles both data types | |
| No labels → an agent interacts with an environment | Maximise reward through trial and error | Game AI · robots in mazes/warehouses · AVs · predictive text · real-time ad bidding | Designing the reward mechanism · exploration vs exploitation trade-off |
Inside supervised learning - two sub-types. Classification models predict specific categorical responses by labelling input data (e.g. "Spam" vs "Not spam", image recognition, medical diagnosis); is used mostly for classification. Regression models predict a continuous numerical outcome (e.g. car price, stock prices, temperature); most commonly produces continuous values.
Inside unsupervised learning - two sub-types. Clustering automatically groups data points sharing similar attributes (e.g. DNA samples, customer segments). Association rule learning identifies relationships between data points (e.g. people who buy X also buy Y).
An agent acts in an environment → rewarded actions get reinforced, errors trigger penalties proportional to the error's scale. The agent is never told what to do, it learns from trial and error. Real deployment → Amazon's warehouse supply chain optimisation.
Scenario says "labelled" → supervised. "Find patterns, no labels" → unsupervised. "Rewards and penalties" → reinforcement. "A bit of labelled, lots of unlabelled" → semi-supervised. LLMs → semi-supervised is a recurring fact.