GitHub

Requirements:

python 3.6
lda
pyvi
demoji (Demoji need to download a package to be able to perform. See the detail here https://pypi.org/project/demoji)
scikit-learn
seaborn
pickle
pandas

CRAWLING DATA

To crawl the data, run the bash files (test.sh to test4.sh). You can change the parameters in those files to customize the crawling session.

Data will be crawled from the fanpage links in "link_fb" files.

PRE-PROCESSING

After crawling raw data, we need to pre-process it before process it with LDA. Run the "pre-processing.py" to perform this stage. Remember to change the file directions in the file to get the right dataset and stopwords file.

Run the "Pre_processing_IDF.ipynb" notebook to finish preprocessing

There are 4 output files of this stage:

corpus.txt
rating.txt
source.txt
done_processing.txt

You can put them into the datasets folder to categorize them for each dataset.

LDA STAGE

Run the "lda_simple.py" to process the data. Remember to change the file direction to get the right dataset folders. The LDA model will be saved in the "model" folder inside each dataset folder.

VISUALZIE

Run the "Visualize.ipynb" notebook to visualize the results

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
Data-Celeb-Nov		Data-Celeb-Nov
Data-Community-Nov		Data-Community-Nov
Data-Media-Nov		Data-Media-Nov
Data-Media-Oct		Data-Media-Oct
Data		Data
fbcrawl		fbcrawl
out		out
.gitignore		.gitignore
Celeb-Nov.png		Celeb-Nov.png
Community-Nov.png		Community-Nov.png
Dota2QuotesVN.csv		Dota2QuotesVN.csv
LICENSE		LICENSE
Media-Nov.png		Media-Nov.png
Media-Oct.png		Media-Oct.png
Pre_processing_IDF.ipynb		Pre_processing_IDF.ipynb
README.md		README.md
Topic_Explorer.ipynb		Topic_Explorer.ipynb
Untitled.ipynb		Untitled.ipynb
Untitled1.ipynb		Untitled1.ipynb
Untitled2.ipynb		Untitled2.ipynb
Untitled3.ipynb		Untitled3.ipynb
Untitled4.ipynb		Untitled4.ipynb
Untitled5.ipynb		Untitled5.ipynb
Untitled6.ipynb		Untitled6.ipynb
Visuallize.ipynb		Visuallize.ipynb
asda.csv		asda.csv
asdadfs.csv		asdadfs.csv
asdadfsv.csv		asdadfsv.csv
cntt.csv		cntt.csv
comments.png		comments.png
corpus-Media-Oct.txt		corpus-Media-Oct.txt
corpus.txt		corpus.txt
done_processing.txt		done_processing.txt
finalized_model.sav		finalized_model.sav
get_linkfb.py		get_linkfb.py
lda2vec_test.py		lda2vec_test.py
lda_simple.py		lda_simple.py
linkfb.json		linkfb.json
linkfb2.json		linkfb2.json
linkfb3.json		linkfb3.json
linkfb4.json		linkfb4.json
pre_processing.py		pre_processing.py
rating.txt		rating.txt
runner_facebook.sh		runner_facebook.sh
running.py		running.py
scrapy.cfg		scrapy.cfg
script		script
source.txt		source.txt
stopwords.txt		stopwords.txt
temp.py		temp.py
test.sh		test.sh
test2.sh		test2.sh
test3.sh		test3.sh
test4.sh		test4.sh
testing		testing
trump.png		trump.png
vietnamese-stopwords.txt		vietnamese-stopwords.txt
yan.csv		yan.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CRAWLING DATA

PRE-PROCESSING

LDA STAGE

VISUALZIE

About

Uh oh!

Releases

Packages

Languages

License

vietdelta/TopicModelling

Folders and files

Latest commit

History

Repository files navigation

CRAWLING DATA

PRE-PROCESSING

LDA STAGE

VISUALZIE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages