Skip to content

Tokenizers not model-agnostic #122

Open
@urroxyz

Description

@urroxyz

I believe that the library's support models are based on tokenization and not model brand.

For example, using any of the latest Falcon models should work because they run on PreTrainedTokenizerFast, however, the following error arises:

NotImplementedError: Tokenizer not supported: PreTrainedTokenizerFast

This seems to be because the library recognizes models based on their path or title as opposed to their tokenizer type, and so tiiuae/Falcon3-1B-Base is unsupported, even though it should not be. The solution is to compare the loaded tokenizer to a list of supported tokenizer classes instead of focusing on naming conventions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions