-
-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
Thanks for your work on this nice project.
I intend to create a library for text simplification, and potentially would like to integrate your package.
The selection of a tokenizer has an impact on the obtained readability scores and I was wondering how you approached this issue.
Was there any specific reason for choosing the Tweet-Tokenizer over e.g. the default/recommended Nltk-Tokenizer which better depicts the Penn Treebank's definition of word-boundaries?
tokenizer = TweetTokenizer() |
dogweather
Metadata
Metadata
Assignees
Labels
No labels