Open
Description
BertClassifier
supports one output for the entire input sequence, but per-token classification is important for applications such as POS and NER tagging. This will require some scoping and design:
- Do we need a separate
BertTokenClassifierPreprocessor
or is just passing a label tensor the same length as the input enough? - Do we want to offer a script that can turn a standard dataset like conll03 into something our preprocessor can use? In general most token labels are for "words" and not the subword tokens created by WordPiece/SentencePiece.
- Create a
BertTokenClassifier
task model with the correct task-specific layers and preprocessing.