Cheat sheets

One-page summaries of every blog post and project.

Machine Learning
Part 1 · What is Machine Learning?
The foundations. Decisions, definitions, and pitfalls that decide whether a model generalises or memorises.
Updated June 2026
Machine Learning
Part 2 · Data Cleaning & Preprocessing
Decisions, code patterns, and traps. Designed for fast exam-style revision and quick lookup during interviews.
Updated June 2026
Machine Learning
Part 3 · Feature Engineering
Picking the features that actually matter. Filter, wrapper, embedded methods and the regularisation maths behind Ridge / Lasso / Elastic Net.
Updated June 2026
Machine Learning
Part 4 · Classification Metrics
Confusion matrix, precision/recall trade-off, F1, MCC, Cohen's Kappa, and what to use when classes are imbalanced.
Updated June 2026
Machine Learning
Part 5 · Cross-Validation, Bias–Variance & ROC
How to evaluate models honestly. K-fold, train/val/test, bias-variance, ROC, regression and clustering metrics.
Updated June 2026
Machine Learning
Part 6 · Naïve Bayes
The probabilistic baseline. Bayes' theorem, the conditional-independence assumption, Gaussian / Multinomial / Bernoulli variants, and when to actually use it.
Updated June 2026
Machine Learning
Part 7 · Decision Trees
How trees actually split, why Gini vs entropy barely matters, how depth controls overfitting, and the interpretability trade-off.
Updated June 2026
Machine Learning
Part 8 · Random Forest & Boosting
Bagging vs boosting, why Random Forest wins out of the box, and how XGBoost / LightGBM / CatBoost differ under the hood.
Updated June 2026
Machine Learning
Part 9 · Support Vector Machines
The margin maximiser. Linear vs kernel SVMs, the kernel trick, the C / γ trade-off, and when SVM still beats trees.
Updated June 2026
Machine Learning
Part 10 · PCA & Dimensionality Reduction
Principal Component Analysis end-to-end. Why it works, when it breaks, how it differs from LDA, and the practical workflow.
Updated June 2026
Machine Learning
Part 11 · LDA & QDA
Generative classifiers. Linear and Quadratic Discriminant Analysis, the shared / per-class covariance trade-off, and why LDA also moonlights as dimensionality reduction.
Updated June 2026
Machine Learning
Part 12 · KNN & Recommender Systems
The simplest non-parametric model. K-Nearest Neighbours for classification, regression, and as the core of memory-based recommender systems.
Updated June 2026
Machine Learning
Forest Cover Type
A practical walkthrough of feature interpretation and sanity checks on the Kaggle Forest Cover Type Prediction dataset. Read the data before you fit the model.
Updated April 2024
Machine Learning
Spaceship Titanic
When missing values aren't random. Reading the PassengerID and Cabin schemas to impute relationally, then a 10-model bake-off on the Kaggle classification task.
Updated June 2026
NLP
Part 1 · Introduction to NLP
The 5-level ladder, the hard problems, and the field map. Designed for fast revision and quick lookup before the rest of the series.
Updated June 2026
NLP
Part 2 · From Text to Vectors
Every preprocessing decision, every formula, every trap from bag of words to lemmatization — condensed for fast revision.
Updated June 2026
NLP
Part 3 · Tagging & Parsing
POS basics, the HMM math, parsing types, the error-compounding cascade, and the common tag reference — all in one page.
Updated June 2026
NLP
Part 6 · Text Classification (Classical)
Four illustrated pages — foundations, applications, the four classical methodologies, and the practitioner rules.
Updated June 2026
NLP
Part 7 · Text Classification (Deep Learning)
Four illustrated pages — deep representations, language modelling and pretraining, transfer learning and fine-tuning, and the zero-shot / prompt era.
Updated July 2026
MLOps
From Notebook to Endpoint
An MLOps end-to-end on Microsoft Azure. From a Kaggle scikit-learn model to a live FastAPI inference endpoint with CI/CD, scale-to-zero, and a budget alert.
Updated June 2026
Forecasting
Forecast the Material, Not the Product
An IE MBD Capstone for a Spanish road-safety manufacturer. Aggregating SKUs into raw materials, splitting by purchase volume, and routing series to Prophet or XGBoost based on shape.
Updated June 2026
Forecasting
Three Ways to Encode an Hour
Forecasting hourly bike-share demand in Washington, D.C. The horse race between numerical, one-hot, and cyclic sin/cos encodings — plus the October 2012 anomaly that broke the dataset.
Updated June 2026
Reinforcement Learning
From Prediction to Decision — RL Cheat Sheet
A gentle introduction to reinforcement learning. How RL differs from supervised learning, the agent-environment loop, value vs policy methods, and key algorithms.
Updated February 2026
Reinforcement Learning
Lunar Lander PPO
Training a PPO agent to land between two flags. Reward shaping for accuracy, stability over speed, and what the actor-critic split actually does.
Updated January 2026
Reinforcement Learning
Training AWS DeepRacer
A 1/18-scale autonomous car, a reward function, and the gap between simulator and a real track. The SageMaker + RoboMaker stack, PPO vs SAC, and sim-to-real strategies.
Updated October 2023
Graph Analytics
Instagram Influencer Graph
Network analysis on a 70k-node Instagram shoe community. Reducing the graph, ranking influence with centrality + community detection, and simulating diffusion with SIR.
Updated December 2023
Data Cleaning
Mercedes Hackathon
268,000 rows of UK used-car listings, seven messy tables, and the case for domain knowledge in data cleaning. You don't clean what you don't recognise.
Updated June 2026
Networking
TCP/IP Fundamentals
The networking layer cake. OSI vs TCP/IP, the journey of a packet, common protocols, and the questions that show up in system-design interviews.
Updated June 2026
Algorithms
Tower of Hanoi
The rules, the recursive plan, and the hand-pattern that solves any size puzzle. Designed for fast revision and quick lookup.
Updated June 2026
Econometrics
Corporate Data Breaches
645 U.S. corporate data breaches, 29,000+ 10-K filings, and what changes in the language of an annual report once a company has been hacked. Bachelor's thesis at Universidad Carlos III (9.75/10, INNCYBER award).
Updated June 2026

Cheat sheets

Part 1 · What is Machine Learning?

Part 2 · Data Cleaning & Preprocessing

Part 3 · Feature Engineering

Part 4 · Classification Metrics

Part 5 · Cross-Validation, Bias–Variance & ROC

Part 6 · Naïve Bayes

Part 7 · Decision Trees

Part 8 · Random Forest & Boosting

Part 9 · Support Vector Machines

Part 10 · PCA & Dimensionality Reduction

Part 11 · LDA & QDA

Part 12 · KNN & Recommender Systems

Forest Cover Type

Spaceship Titanic

Part 1 · Introduction to NLP

Part 2 · From Text to Vectors

Part 3 · Tagging & Parsing

Part 6 · Text Classification (Classical)

Part 7 · Text Classification (Deep Learning)

From Notebook to Endpoint

Forecast the Material, Not the Product

Three Ways to Encode an Hour

From Prediction to Decision — RL Cheat Sheet

Lunar Lander PPO

Training AWS DeepRacer

Instagram Influencer Graph

Mercedes Hackathon

TCP/IP Fundamentals

Tower of Hanoi

Corporate Data Breaches