
Table of Contents
Last update: February 2024. All opinions are my own.
1. Overview
This project predicts passenger outcomes in the Spaceship Titanic dataset. The work focuses on data cleaning, feature relationships, and pipeline experimentation.
2. Data Preparation
- Explored missingness patterns and potential leakage.
- Created a reproducible preprocessing pipeline.
- Encoded categorical variables and scaled numeric features.
3. Modeling and Validation
- Compared baseline models to more expressive learners.
- Used cross-validation to evaluate generalization.
- Iterated on feature sets and hyperparameters.
4. Takeaways
Systematic handling of missing values made the biggest difference, especially when paired with consistent validation.
5. Skills and Tools
- Data cleaning
- Pipeline development
- Grid search
- Exploratory data analysis
- Cross-validation
