This project was part of my 'Intro to Machine Learning Nanodegree' provided by Udacity. The code and text in this project is a combination of my own work and that of Udacity.
The project concerns a company that performs mail-order sales in Germany. Their main question of interest was to identify facets of the population that are most likely to be purchasers of their products for a mailout campaign. The goal was to use unsupervised learning techniques to organize the general population into clusters, then use those clusters to see which of them comprise the main user base for the company. Data cleaning and transforming techniques were applied to convert the data into a usable form.
The data used in this project is real-life data provided by Bertelsmann partners AZ Direct and Arvato Finance Solution. Publishing of the data online was not allowed as per the agreement with Arvato Bartlesmann, as the data is proprietary. The files used in this project were:
- Udacity_AZDIAS_Subset.csv: Demographic data for the general population of Germany; 891211 persons (rows) x 85 features (columns).
- Udacity_CUSTOMERS_Subset.csv: Demographic data for customers of a mail-order company; 191652 persons (rows) x 85 features (columns).
- Data_Dictionary.md: Information file about the features in the provided datasets.
- AZDIAS_Feature_Summary.csv: Summary of feature attributes for demographic data.