This repository contains a project focused on performing cluster analysis to group data into meaningful segments. Cluster analysis is a key technique in unsupervised learning, widely used in fields such as marketing, biology, social sciences, and more.
- Hierarchical Clustering
- K-means
This project applies clustering techniques to identify patterns and groupings in the dataset. It covers the following steps:
- Preprocessing data to handle missing values, normalization, and outliers.
- Applying clustering algorithms such as K-Means, Hierarchical Clustering.
- Visualizing the clusters for better interpretability.
- Evaluating the clusters using metrics such as silhouette score.
To run the project, ensure you have the following installed:
- Python 3.8 or above
- Libraries:
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
- SciPy (optional, for hierarchical clustering)
-
Clustering Algorithms:
- K-Means: Identify compact spherical clusters.
- Hierarchical Clustering: Visualize cluster hierarchy with dendrograms.
-
Evaluation Metrics:
- Dendrogram
- The Elbow Method
- Silhouette Analysis
Key findings and visualizations will be added here, such as:
- Cluster centers and sizes.
- Visual representation of clusters in 2D/3D space.
https://github.com/tanhpuh/cluster_analysis/blob/main/Cluster_Analysis.ipynb