This project focuses on the crucial task of detecting depression and suicide-related content in Reddit posts. With the increasing impact of depression on individuals' quality of life, leveraging natural language processing (NLP) and machine learning becomes essential in identifying potential cases of mental health concerns. We utilized selected posts from Kaggle's r/SuicideWatch and r/depression subreddits, applying various models to determine the most effective approach for suicide and depression detection.
Keywords: Bio-Medical NLP, NER, AI, binary classification
Biomedical natural language processing (NLP) plays a crucial role in extracting valuable insights from biomedical texts, contributing to advancements in healthcare. This study explores the application of NLP techniques to Reddit posts for depression and suicide detection, aiming to improve patient care and accelerate research in mental health.
We draw inspiration from a 2022 study that emphasized the significance of monitoring suicidal ideation on social media platforms. The integrated model presented in their work, analyzing data from platforms like Reddit and Twitter, serves as a valuable reference for our approach.
Explore the advancements, achievements, and methodologies in the Bio-Medical NLP field, including supervised learning, text classification, NLTK, tokenization, stemming, lemmatization, named entity recognition (NER), SVM, XGBoost, BERT, and neural networks.
Detail the methodology of the experiment, from motivation to data preparation, pre-processing, named entity recognition, and the application of various models such as SVM, neural networks, and BERT. Discuss the achieved results and the challenges encountered during the experiment.
Summarize the findings, highlighting the best-performing model (Neural Networks) and addressing encountered issues. Emphasize the potential applications of the research in contributing to mental health awareness and support.
Outline plans for future work, including the application of an approved dataset, seeking diverse datasets, and addressing specific issues encountered during the experiment.
Provide references to relevant studies and sources cited throughout the paper.