Komal Wavhal Lilli Nappi Milan Girish Chandiramani Suraj Gangwani
Diabetes, a prevalent chronic disease, affects millions of people worldwide and is linked to severe complications such as heart disease, vision loss, and kidney failure. Early detection of diabetes can significantly improve treatment outcomes and reduce healthcare costs. This problem can be addressed by developing predictive models to identify individuals at risk of developing diabetes or prediabetes.
The dataset from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) for 2015 contains responses from 70,692 individuals, equally split between those with diabetes or prediabetes (class 1) and those without (class 0). The dataset includes 21 feature variables related to health behaviors and conditions, providing an opportunity to build a classification model to predict diabetes risk.
The goal of this project is to create a machine learning model that can accurately classify individuals into two categories: those at risk of diabetes (prediabetes or diabetes) and those who are not. Such a model would support early diagnosis, allowing for timely interventions to manage and potentially prevent the disease.
About Dataset: diabetes _ binary _ 5050split _ health _ indicators _ BRFSS2015.csv is a clean dataset of 70,692 survey responses to the CDC's BRFSS2015. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced.
