Skip to content

joshianirudh/DocumentClustering

Repository files navigation

GUILD: Gaining Useful Insights From Large Datasets

GUILD is an unsupervised data categorization framework to cluster and assign a topic to unorganized raw text data.

Paper Link

https://www.overleaf.com/6685394279ybbxvcmbgysv

Usage

git clone https://github.com/joshianirudh/DocumentClustering.git
pip install -r requirements.txt

Code

Code is in Notebooks/GUILD.ipynb

Datasets Used

Datasets Download Link
Kpris https://github.com/TeamLab/pdcde2018/tree/master/dataset
BBC https://www.kaggle.com/datasets/hgultekin/bbcnewsarchive
20News https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •