Skip to content

Conversation

jangop
Copy link

@jangop jangop commented Sep 14, 2022

This is minor, I know. tokenize simply iterates over texts, so in addition to list, tuple is fine. The intended type hint for this is Sequence.

I am not sure which version of Python this project targets, but judging from the other type hints in this file, I am going to assume <3.9. Otherwise, I would suggest importing Sequence from collections.abc instead of typing.

This is minor, I know. `tokenize` simply iterates over `texts`, so in addition to `list`, `tuple` is fine. The intended type hint for this is `Sequence`.

I am not sure which version of Python this project targets, but judging from the other type hints in this file, I am going to assume `<3.9`. Otherwise, I would suggest importing `Sequence` from `collections.abc` instead of `typing`.


def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> Union[torch.IntTensor, torch.LongTensor]:
def tokenize(texts: Union[str, Sequence[str]], context_length: int = 77, truncate: bool = False) -> Union[torch.IntTensor, torch.LongTensor]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring at line 203 should be modified too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants