Skip to content

The-Gupta/Time

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Time Magazine

Time Magazine Scraper, Text Extraction (OCR), and Data Exploration with Topic Modelling

01.ipynb: Code
Open in Colab to explore the topics (and their dominant terms) or run the code.

Part 1 : Scraping from Time Vault from 1923-2015.
Scraped Data

Part 2: Text Extraction with Tesseract OCR.
Currently, the text is extracted only from 2000-2015, since the process is slow.
And yes, extracted text has lots of noise.

Part 3: Data Exploration with Topic Modelling.
TODO: For all years, and interpretation.

About

Time Magazine Scraper, Text Extraction (OCR), and Data Exploration with Topic Modelling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published