This project aims to analyze insurance agent performance, predict future NILL (zero sales) agents, and segment agents into performance categories for targeted interventions. The goal is to leverage data-driven insights to enhance agent productivity and overall business income. This project was developed by team DeepCell for the Data Storm 6.0 competition.
The analysis is based on a dataset including the following key metrics for each agent:
unique_customers_last_21_days
unique_quotations_last_21_days
unique_proposals_last_21_days
new_policy_count
ANBP_value
(Annualized New Business Premium)net_income
number_of_policy_holders
- Agent tenure and sales history data.
The project is structured into several key components:
- Exploratory Data Analysis (EDA): Understanding data distributions, correlations, and initial insights.
- NILL Agent Prediction: A predictive model to identify agents likely to have zero policy sales in the upcoming month.
- Agent Performance Clustering & Monitoring: Segmentation of agents into performance tiers (Low, Mid, High) using clustering techniques, tracking their transitions, and devising custom intervention strategies.
- Interactive Dashboard: A Streamlit application to visualize NILL agent predictions.
(Detailed EDA can be found in notebooks/EDA.ipynb
and readme_EDA.md
)
- Distribution Analysis: Many key features (
ANBP_value
,net_income
,number_of_policy_holders
) are positively skewed, indicating a long-tail distribution where a few agents significantly outperform others. - Correlation: Strong positive correlation observed between
ANBP_value
andnet_income
. - Top Contributors: Top agents in ANBP are often also top in net income, showing consistency.
- Suggestions from EDA:
- Normalize skewed features (e.g., log transformation).
- Feature engineering (e.g., conversion rates, average policy values).
- Agent clustering for performance segmentation.
- Enhanced visualizations (boxplots, violin plots).
(Implementation in DeepCell.ipynb
- Part 1)
- Objective: Predict if an agent will have zero new policy sales (
new_policy_count
= 0) in the next month. - Model: An LGBMClassifier was trained for this binary classification task.
- Features: Utilized historical sales data, agent tenure, and engineered time-based features.
- Outcome: The model provides probabilities for an agent being "NILL" next month, enabling proactive measures. Predictions are available in
data/submission.csv
and visualized in the dashboard.
(Implementation in DeepCell.ipynb
- Part 2, and clustering.ipynb
)
- Objective: Segment agents into distinct performance categories to understand characteristics and tailor improvement strategies.
- Methodology:
- KPI Selection: Key Performance Indicators such as
new_policy_count
,ANBP_value
,net_income
,unique_customers
, andnumber_of_policy_holders
were used. - Clustering Algorithm: K-Means clustering was chosen after comparing several algorithms (KMeans, Agglomerative Clustering, DBSCAN, Gaussian Mixture), identifying three primary clusters: "Low," "Mid," and "High" performers.
- KPI Selection: Key Performance Indicators such as
- Key Insights from Clustering:
- Low Performers: Often new agents; tend to stay in this category longer.
- Mid Performers: Good customer base metrics but may struggle with high-value sales.
- High Performers: Excel in
ANBP_value
andnet_income
but can be prone to dropping to lower tiers.
- Performance Trends:
- Majority of agents tend to stay in the same performance level month-to-month.
- A higher percentage of agents drop in performance than improve.
- Most new agents start in the "Low" performance category.
- Custom Intervention Strategies: Developed tailored strategies for each performance segment (Low, Mid, High) focusing on onboarding, mentorship, skill-bridging, and retention.
(Code in dashboard/app.py
)
A Streamlit dashboard has been developed to:
- Display the NILL agent predictions from the model.
- Show summary statistics of the predictions.
- Allow exploration of individual agent predictions.
- Provide a downloadable CSV of the results.
DeepCell.ipynb
: Main Jupyter notebook containing EDA, NILL agent prediction model, and agent performance clustering.notebooks/EDA.ipynb
: Dedicated Jupyter notebook for detailed Exploratory Data Analysis.clustering.ipynb
: Jupyter notebook focused on the agent clustering analysis.dashboard/app.py
: Python script for the Streamlit dashboard.data/
: Directory containing datasets (e.g.,train_storming_round.csv
,test_storming_round.csv
,submission.csv
).requirements.txt
: List of Python dependencies for the project.README.md
: This file, providing an overview of the project.readme_EDA.md
: Detailed README focusing on the EDA phase.
- Python 3.8+
Clone the repository and install the required dependencies:
git clone <repository-url>
cd <repository-name>
pip install -r requirements.txt
You can run the Jupyter notebooks (DeepCell.ipynb
, notebooks/EDA.ipynb
, clustering.ipynb
) using Jupyter Lab or Jupyter Notebook.
To start the Streamlit dashboard:
-
Navigate to the
dashboard
directory:cd dashboard
-
Ensure the prediction file
data/submission.csv
exists in thedata
directory relative toapp.py
(i.e.,../data/submission.csv
from the perspective ofapp.py
, or adjust path inapp.py
). For the currentapp.py
, it expectsdata/submission.csv
in the same directory asapp.py
or adata
subdirectory withindashboard
. Assumingdata
is at the project root, the path inapp.py
might need to be../data/submission.csv
. The currentapp.py
usespd.read_csv("data/submission.csv")
. Ifapp.py
is run from thedashboard
directory, it will look fordashboard/data/submission.csv
. Ensure yoursubmission.csv
is placed correctly or update the path inapp.py
to point to the rootdata
folder (e.g.,pd.read_csv("../data/submission.csv")
). -
Run the Streamlit application:
streamlit run app.py
The dashboard will open in your web browser.
This project provides a comprehensive framework for understanding and improving agent performance. By combining EDA, predictive modeling for NILL agents, and performance clustering, actionable insights are generated. The developed intervention strategies, based on data-driven segments, offer a pathway for targeted agent development, potentially leading to increased sales, higher net income, and better agent retention. The interactive dashboard further empowers stakeholders to utilize these predictions effectively. This analysis sets the stage for further predictive modeling, optimization, and continuous performance monitoring.