COMP-6721: Intro to AI

Project 2-SpamDetection

Project Report Link

Main file: launcher.py

Python Files:

calculated_values.py: stores the values computed by model during the training phase.
constants.py: stores the constants.
train_model.py: contains the function use to train the model (Prepare Model Data) (ex: calculate class probability, calculate word probability etc).
file_operation.py: contains the function related to I/O operation.
naive_bayes.py: contains the function use to predict Spam/Ham label for an email, using the trained model.
pre_processing.py: contains the function use to perform cleaning steps on the raw email before using it to train the model.
graph.py: generate the prediction results graph (bar graph).

Files information:

model.txt: File storing the tuples containing (word,tf(ham),prob(ham),tf(spam),prob(spam)). tf -> term frequency
result.txt: File storing the final results based on the format mentioned in the project description.
values.txt: File storing the intermediate count related values that can used directly to avoid recomputations everytime the code is executed.

Instructions to run the project:

Download/Clone the Project Repo to your local machine - IntroToAI-SpamDetector

Note: Project can also be downloaded from google drive - Google Drive Link
Copy the train and test with names as 'train' and 'test' respectively inside the project root directory i.e. 'IntroToAI-SpamDetector'.

Note: Train and Test data folders can be downloaded from google drive. The final directory structure should look exactly as shown in google drive - Google Drive Link
Navigate to '\IntroToAI-SpamDetector\src' in your terminal
Run CMD: python launcher.py
Check results folder '\IntroToAI-SpamDetector\results'

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
report		report
results		results
src		src
.gitignore		.gitignore
ProblemStatement.pdf		ProblemStatement.pdf
README.md		README.md

Provide feedback