This project was developed as part of a Machine Learning course in 2025.
The goal was to predict which passengers survived the Titanic disaster using logistic regression, with a strong focus on data exploration and feature engineering.
Perform binary classification using the Kaggle Titanic dataset and a logistic regression model.
- Investigated missing data, outliers, and correlations
- Analyzed how features like age, fare, gender, and family size impact survival
- Visualized distributions and relationships using plots
- Created new features such as:
FamilySize
(SibSp + Parch + 1)IsAlone
(derived from FamilySize)Title
extracted from passenger names (Mr, Miss, etc.)
- Handled missing data and encoded categorical variables
- Scaled numerical features
- Used logistic regression with
scikit-learn
- Split data into temporary train and validation sets
- Tuned hyperparameters and evaluated using validation accuracy
- Confusion Matrix, Accuracy, Precision, Recall, F1-score
- Visualizations of training vs validation performance
- Submitted results to Kaggle for evaluation
- Tracked performance on the leaderboard
- Python
- scikit-learn
- Pandas, NumPy
- Matplotlib, Seaborn
- Jupyter Notebook / Kaggle Notebook
Titanic.ipynb
– Full notebook with analysis and modelsubmission.csv
– Kaggle submission file
- The importance of feature engineering for improving model performance
- How to structure a machine learning pipeline from raw data to evaluation
- Practical usage of logistic regression for binary classification
This project served as the foundation for a more advanced version that includes multiple classification models and feature selection techniques:
➡️ Titanic Classification Ensemble
Itamar Hadad
B.Sc. Computer Science Student – Afeka College
📧 [email protected]
LinkedIn