Project 2-SpamDetection
Main file: launcher.py
Python Files:
- calculated_values.py: stores the values computed by model during the training phase.
- constants.py: stores the constants.
- train_model.py: contains the function use to train the model (Prepare Model Data) (ex: calculate class probability, calculate word probability etc).
- file_operation.py: contains the function related to I/O operation.
- naive_bayes.py: contains the function use to predict Spam/Ham label for an email, using the trained model.
- pre_processing.py: contains the function use to perform cleaning steps on the raw email before using it to train the model.
- graph.py: generate the prediction results graph (bar graph).
Files information:
-
model.txt: File storing the tuples containing (word,tf(ham),prob(ham),tf(spam),prob(spam)). tf -> term frequency
-
result.txt: File storing the final results based on the format mentioned in the project description.
-
values.txt: File storing the intermediate count related values that can used directly to avoid recomputations everytime the code is executed.
Instructions to run the project:
-
Download/Clone the Project Repo to your local machine - IntroToAI-SpamDetector
Note: Project can also be downloaded from google drive - Google Drive Link
-
Copy the train and test with names as 'train' and 'test' respectively inside the project root directory i.e. 'IntroToAI-SpamDetector'.
Note: Train and Test data folders can be downloaded from google drive. The final directory structure should look exactly as shown in google drive - Google Drive Link
-
Navigate to '\IntroToAI-SpamDetector\src' in your terminal
-
Run CMD: python launcher.py
-
Check results folder '\IntroToAI-SpamDetector\results'