-
Notifications
You must be signed in to change notification settings - Fork 0
qiangzi11hao/ATEC
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
一. 文件结构
dev version:
Antfin
--data
--data_all.csv //data set, 102k
--user_dict.txt //分词词典,用于jieba
--sgns.merge.word //pretrained embedding
--saved_model //used for saved model weight
--utils
--__init__.py //used for import langconv
--langconv.py //用于繁体->简体
--zh_wiki.py //同上
--main.py //used for preparation, train, and predict.
--max_mag_embedding_model.py //class of model
--vocab.py //class of vocab containing embedding matrix, word2id, padding...
--readme.txt
How to run:
Run main.py, then will get vocab.data and the weights inside saved_models.
sgns.merge.word: https://pan.baidu.com/s/1luy-GlTdqqvJ3j-A4FcIOw
本项目中由于采用的word2vec与之前的不同,所以运行时需要将sgns.merge.word改成sgns.merge.char,经过之前的一次测试,两者差别不大,没有进行深入的比较。
model的框架主要参考的QA-LSTM,通过将embedding后的vector,先进行biLSTM layer进行编码,然后通过一个multi-head conv层(kernel size =【2,3,5】),再进行max-pooling,将三层结果concate,直接送入dense网络。
改进idea:
1.biLSTM可以在加一层,然后采用highway network扩展语义
2.在max-pooling后加入人工特征,进一步进行提取
3.最后加match layer
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published