IMDB-Sentiment

IMDb Movie Review - NLP Preprocessing

Overview

This project is a case study comparing two text preprocessing techniques:

Stemming (using PorterStemmer)

Lemmatization (using WordNetLemmatizer)

The goal is to observe the impact on vocabulary size, text clarity, and information retention.

Steps Performed

Cleaned the text: lowercasing, punctuation removal, digit removal, etc.

Removed custom stopwords.

Created two versions of the reviews: one stemmed, one lemmatized.

Analyzed and visualized the results using bar plots and word clouds.

Results

Stemmed Vocabulary Size: ~22,000 words

Lemmatized Vocabulary Size: ~26,000 words

Conclusion: Lemmatization preserved better semantic meaning and richer vocabulary compared to stemming.

Technologies

Python

Pandas

NLTK

Matplotlib

Seaborn

WordCloud

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Sentiment Analysis.ipynb		Sentiment Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB-Sentiment

IMDb Movie Review - NLP Preprocessing

Overview

Steps Performed

Results

Technologies

About

Uh oh!

Releases

Packages

Languages

ssrishtix/IMDB-Sentiment

Folders and files

Latest commit

History

Repository files navigation

IMDB-Sentiment

IMDb Movie Review - NLP Preprocessing

Overview

Steps Performed

Results

Technologies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages