Skip to content

Question: Why NLTK TweetTokenizer? #26

@freecraver

Description

@freecraver

Thanks for your work on this nice project.

I intend to create a library for text simplification, and potentially would like to integrate your package.
The selection of a tokenizer has an impact on the obtained readability scores and I was wondering how you approached this issue.

Was there any specific reason for choosing the Tweet-Tokenizer over e.g. the default/recommended Nltk-Tokenizer which better depicts the Penn Treebank's definition of word-boundaries?

tokenizer = TweetTokenizer()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions