📰 Fake News Detection using Logistic Regression

A binary text classification machine learning project that detects whether a news article is real or fake using Natural Language Processing (NLP) techniques and Logistic Regression.

🧠 Problem Statement

Given a dataset of real and fake news articles, build a model that can classify new/unseen news as either real or fake based on its textual content.

📊 Dataset Details

Source: WELFake_Dataset.csv
Download Link: WELFake Dataset on Kaggle
Columns Used: title, text, label
Target Label:
- 1 = Real news
- 0 = Fake news
Final Input Feature: Combined title and text into a single text field before preprocessing

🚀 Project Workflow

Import Dependencies
Load Dataset (WELFake_Dataset.csv)
Preprocess Data
- Remove nulls
- Combine text columns
- Clean text with regex
- Remove stopwords & apply stemming
Feature Engineering
- TF-IDF Vectorization using unigrams & bigrams
Train-Test Split
- Stratified 80/20 split
Model Training
- Compared Logistic Regression, Naive Bayes, and Random Forest
- Selected Logistic Regression for best balance of speed and performance
Model Evaluation
- Accuracy, Precision, Recall, F1-score
- Confusion Matrix (visualized with seaborn)
Save Model & Vectorizer using pickle
Custom Prediction
- In-notebook prediction function

📈 Model Performance

Metric	Training Set	Test Set
Accuracy	95.86%	94.64%
F1-score	0.96	0.95

🔍 Confusion Matrix Heatmap (Test Set)

✔️ Indicates strong generalization with balanced performance on both classes.

🧪 Example Usage

Enter: "Breaking: President gives major update on national policy."
Output: "Prediction for custom news input: Real "

📦 Installation & Setup

git clone https://github.com/Toshaksha/fake_news_prediction.git
cd fake_news_prediction
pip install -r requirements.txt

🗂 Project Structure

fake-news-detection/
│
├── fake_news_prediction.ipynb            # Jupyter notebook (model training)
├── requirements.txt                      # Python dependencies
├── models/
│   ├── logistic_regression_model.pkl     # Saved ML model
│   └── tfidf_vectorizer.pkl              # Saved TF-IDF vectorizer
├── images/
│   └── confusion_matrix.jpg              # Confusion matrix heatmap
└── README.md                             # Project documentation

🧰 Tools & Libraries Used

Python 3.x
NLTK – for stopwords and stemming
Scikit-learn – ML models and metrics
Pandas, NumPy – data handling
Seaborn, Matplotlib – visualization
tqdm – progress bar for processing

👤 Author

Toshaksha – GitHub Profile

⭐ If you found this project helpful, please give it a star on GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 Fake News Detection using Logistic Regression

🧠 Problem Statement

📊 Dataset Details

🚀 Project Workflow

📈 Model Performance

🔍 Confusion Matrix Heatmap (Test Set)

🧪 Example Usage

📦 Installation & Setup

🗂 Project Structure

🧰 Tools & Libraries Used

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fake_news_prediction.ipynb		fake_news_prediction.ipynb
requirements.txt		requirements.txt

License

Toshaksha/fake_news_prediction

Folders and files

Latest commit

History

Repository files navigation

📰 Fake News Detection using Logistic Regression

🧠 Problem Statement

📊 Dataset Details

🚀 Project Workflow

📈 Model Performance

🔍 Confusion Matrix Heatmap (Test Set)

🧪 Example Usage

📦 Installation & Setup

🗂 Project Structure

🧰 Tools & Libraries Used

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages