- Machine Learning
Part 1 · What is Machine Learning?
The foundations. Decisions, definitions, and pitfalls that decide whether a model generalises or memorises.
Updated June 2026 - Machine Learning
Part 2 · Data Cleaning & Preprocessing
Decisions, code patterns, and traps. Designed for fast exam-style revision and quick lookup during interviews.
Updated June 2026 - Machine Learning
Part 3 · Feature Engineering
Picking the features that actually matter. Filter, wrapper, embedded methods and the regularisation maths behind Ridge / Lasso / Elastic Net.
Updated June 2026 - Machine Learning
Part 4 · Classification Metrics
Confusion matrix, precision/recall trade-off, F1, MCC, Cohen's Kappa, and what to use when classes are imbalanced.
Updated June 2026 - Machine Learning
Part 5 · Cross-Validation, Bias–Variance & ROC
How to evaluate models honestly. K-fold, train/val/test, bias-variance, ROC, regression and clustering metrics.
Updated June 2026 - Machine Learning
Part 6 · Naïve Bayes
The probabilistic baseline. Bayes' theorem, the conditional-independence assumption, Gaussian / Multinomial / Bernoulli variants, and when to actually use it.
Updated June 2026 - Machine Learning
Part 7 · Decision Trees
How trees actually split, why Gini vs entropy barely matters, how depth controls overfitting, and the interpretability trade-off.
Updated June 2026 - Machine Learning
Part 8 · Random Forest & Boosting
Bagging vs boosting, why Random Forest wins out of the box, and how XGBoost / LightGBM / CatBoost differ under the hood.
Updated June 2026 - Machine Learning
Part 9 · Support Vector Machines
The margin maximiser. Linear vs kernel SVMs, the kernel trick, the C / γ trade-off, and when SVM still beats trees.
Updated June 2026 - Machine Learning
Part 10 · PCA & Dimensionality Reduction
Principal Component Analysis end-to-end. Why it works, when it breaks, how it differs from LDA, and the practical workflow.
Updated June 2026 - Machine Learning
Part 11 · LDA & QDA
Generative classifiers. Linear and Quadratic Discriminant Analysis, the shared / per-class covariance trade-off, and why LDA also moonlights as dimensionality reduction.
Updated June 2026 - Machine Learning
Part 12 · KNN & Recommender Systems
The simplest non-parametric model. K-Nearest Neighbours for classification, regression, and as the core of memory-based recommender systems.
Updated June 2026 - Machine Learning
Forest Cover Type
A practical walkthrough of feature interpretation and sanity checks on the Kaggle Forest Cover Type Prediction dataset. Read the data before you fit the model.
Updated April 2024 - Machine Learning
Spaceship Titanic
When missing values aren't random. Reading the PassengerID and Cabin schemas to impute relationally, then a 10-model bake-off on the Kaggle classification task.
Updated June 2026 - NLP
Part 1 · Introduction to NLP
The 5-level ladder, the hard problems, and the field map. Designed for fast revision and quick lookup before the rest of the series.
Updated June 2026 - NLP
Part 2 · From Text to Vectors
Every preprocessing decision, every formula, every trap from bag of words to lemmatization — condensed for fast revision.
Updated June 2026 - NLP
Part 3 · Tagging & Parsing
POS basics, the HMM math, parsing types, the error-compounding cascade, and the common tag reference — all in one page.
Updated June 2026 - NLP
Part 6 · Text Classification (Classical)
Four illustrated pages — foundations, applications, the four classical methodologies, and the practitioner rules.
Updated June 2026 - NLP
Part 7 · Text Classification (Deep Learning)
Four illustrated pages — deep representations, language modelling and pretraining, transfer learning and fine-tuning, and the zero-shot / prompt era.
Updated July 2026 - MLOps
From Notebook to Endpoint
An MLOps end-to-end on Microsoft Azure. From a Kaggle scikit-learn model to a live FastAPI inference endpoint with CI/CD, scale-to-zero, and a budget alert.
Updated June 2026 - Forecasting
Forecast the Material, Not the Product
An IE MBD Capstone for a Spanish road-safety manufacturer. Aggregating SKUs into raw materials, splitting by purchase volume, and routing series to Prophet or XGBoost based on shape.
Updated June 2026 - Forecasting
Three Ways to Encode an Hour
Forecasting hourly bike-share demand in Washington, D.C. The horse race between numerical, one-hot, and cyclic sin/cos encodings — plus the October 2012 anomaly that broke the dataset.
Updated June 2026 - Reinforcement Learning
From Prediction to Decision — RL Cheat Sheet
A gentle introduction to reinforcement learning. How RL differs from supervised learning, the agent-environment loop, value vs policy methods, and key algorithms.
Updated February 2026 - Reinforcement Learning
Lunar Lander PPO
Training a PPO agent to land between two flags. Reward shaping for accuracy, stability over speed, and what the actor-critic split actually does.
Updated January 2026 - Reinforcement Learning
Training AWS DeepRacer
A 1/18-scale autonomous car, a reward function, and the gap between simulator and a real track. The SageMaker + RoboMaker stack, PPO vs SAC, and sim-to-real strategies.
Updated October 2023 - Graph Analytics
Instagram Influencer Graph
Network analysis on a 70k-node Instagram shoe community. Reducing the graph, ranking influence with centrality + community detection, and simulating diffusion with SIR.
Updated December 2023 - Data Cleaning
Mercedes Hackathon
268,000 rows of UK used-car listings, seven messy tables, and the case for domain knowledge in data cleaning. You don't clean what you don't recognise.
Updated June 2026 - Networking
TCP/IP Fundamentals
The networking layer cake. OSI vs TCP/IP, the journey of a packet, common protocols, and the questions that show up in system-design interviews.
Updated June 2026 - Algorithms
Tower of Hanoi
The rules, the recursive plan, and the hand-pattern that solves any size puzzle. Designed for fast revision and quick lookup.
Updated June 2026 - Econometrics
Corporate Data Breaches
645 U.S. corporate data breaches, 29,000+ 10-K filings, and what changes in the language of an annual report once a company has been hacked. Bachelor's thesis at Universidad Carlos III (9.75/10, INNCYBER award).
Updated June 2026
