Releases: huggingface/tokenizers
v0.22.1
v0.22.0
What's Changed
- Bump on-headers and compression in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in #1827
- Implement
from_bytes
andread_bytes
Methods in WordPiece Tokenizer for WebAssembly Compatibility by @sondalex in #1758 - fix: use AHashMap to fix compile error by @b00f in #1840
- New stream by @ArthurZucker in #1856
- [docs] Add more decoders by @pcuenca in #1849
- Fix missing parenthesis in
EncodingVisualizer.calculate_label_colors
by @Liam-DeVoe in #1853 - Update quicktour.mdx re: Issue #1625 by @WilliamPLaCroix in #1846
- remove stray comment by @sanderland in #1831
- Fix typo in README by @aisk in #1808
- RUSTSEC-2024-0436 - replace paste with pastey by @nystromjd in #1834
- Tokenizer: Add native async bindings, via py03-async-runtimes. by @michaelfeil in #1843
New Contributors
- @b00f made their first contribution in #1840
- @pcuenca made their first contribution in #1849
- @Liam-DeVoe made their first contribution in #1853
- @WilliamPLaCroix made their first contribution in #1846
- @sanderland made their first contribution in #1831
- @aisk made their first contribution in #1808
- @nystromjd made their first contribution in #1834
- @michaelfeil made their first contribution in #1843
Full Changelog: v0.21.3...v0.22.0rc0
v0.21.4
Full Changelog: v0.21.3...v0.21.4
No change, the 0.21.3 release failed, this is just a re-release.
https://github.com/huggingface/tokenizers/releases/tag/v0.21.3
v0.21.3
v0.21.2
What's Changed
This release if focused around some performance optimization, enabling broader python no gil support, and fixing some onig issues!
- Update the release builds following 0.21.1. by @Narsil in #1746
- replace lazy_static with stabilized std::sync::LazyLock in 1.80 by @sftse in #1739
- Fix no-onig no-wasm builds by @414owen in #1772
- Fix typos in strings and comments by @co63oc in #1770
- Fix type notation of merges in BPE Python binding by @Coqueue in #1766
- Bump http-proxy-middleware from 2.0.6 to 2.0.9 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1762
- Fix data path in test_continuing_prefix_trainer_mismatch by @GaetanLepage in #1747
- clippy by @ArthurZucker in #1781
- Update pyo3 and rust-numpy depends for no-gil/free-threading compat by @Qubitium in #1774
- Use ApiBuilder::from_env() in from_pretrained function by @BenLocal in #1737
- Upgrade onig, to get it compiling with GCC 15 by @414owen in #1771
- Itertools upgrade by @sftse in #1756
- Bump webpack-dev-server from 4.10.0 to 5.2.1 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1792
- Bump brace-expansion from 1.1.11 to 1.1.12 in /bindings/node by @dependabot in #1796
- Fix features blending into a paragraph by @bionicles in #1798
- Adding throughput to benches to have a more consistent measure across by @Narsil in #1800
- Upgrading dependencies. by @Narsil in #1801
- [docs] Whitespace by @stevhliu in #1785
- Hotfixing the stub. by @Narsil in #1802
- Bpe clones by @sftse in #1707
- Fixed Length Pre-Tokenizer by @jonvet in #1713
- Consolidated optimization ahash dary compact str by @Narsil in #1799
- 🚨 breaking: Fix training with special tokens by @ArthurZucker in #1617
New Contributors
- @414owen made their first contribution in #1772
- @co63oc made their first contribution in #1770
- @Coqueue made their first contribution in #1766
- @GaetanLepage made their first contribution in #1747
- @Qubitium made their first contribution in #1774
- @BenLocal made their first contribution in #1737
- @bionicles made their first contribution in #1798
- @stevhliu made their first contribution in #1785
- @jonvet made their first contribution in #1713
Full Changelog: v0.21.1...v0.21.2rc0
v0.21.1
What's Changed
- Update dev version and pyproject.toml by @ArthurZucker in #1693
- Add feature flag hint to README.md, fixes #1633 by @sftse in #1709
- Upgrade to PyO3 0.23 by @Narsil in #1708
- Fixing the README. by @Narsil in #1714
- Fix typo in Split docstrings by @Dylan-Harden3 in #1701
- Fix typos by @tinyboxvk in #1715
- Update documentation of Rust feature by @sondalex in #1711
- Fix panic in DecodeStream::step due to incorrect index usage by @n0gu-furiosa in #1699
- Fixing the stream by removing the read_index altogether. by @Narsil in #1716
- Fixing NormalizedString append when normalized is empty. by @Narsil in #1717
- 🚨 Support updating template processors by @ArthurZucker in #1652. Removed in this release to keep backware compatibility temporarily.
- Update metadata as Python3.7 and Python3.8 support was dropped by @earlytobed in #1724
- Add rustls-tls feature by @torymur in #1732
New Contributors
- @Dylan-Harden3 made their first contribution in #1701
- @sondalex made their first contribution in #1711
- @n0gu-furiosa made their first contribution in #1699
- @earlytobed made their first contribution in #1724
- @torymur made their first contribution in #1732
Full Changelog: v0.21.0...v0.21.1
v0.21.1rc0
What's Changed
- Update dev version and pyproject.toml by @ArthurZucker in #1693
- Add feature flag hint to README.md, fixes #1633 by @sftse in #1709
- Upgrade to PyO3 0.23 by @Narsil in #1708
- Fixing the README. by @Narsil in #1714
- Fix typo in Split docstrings by @Dylan-Harden3 in #1701
- Fix typos by @tinyboxvk in #1715
- Update documentation of Rust feature by @sondalex in #1711
- Fix panic in DecodeStream::step due to incorrect index usage by @n0gu-furiosa in #1699
- Fixing the stream by removing the read_index altogether. by @Narsil in #1716
- Fixing NormalizedString append when normalized is empty. by @Narsil in #1717
- 🚨 Support updating template processors by @ArthurZucker in #1652
- Update metadata as Python3.7 and Python3.8 support was dropped by @earlytobed in #1724
- Add rustls-tls feature by @torymur in #1732
New Contributors
- @Dylan-Harden3 made their first contribution in #1701
- @sondalex made their first contribution in #1711
- @n0gu-furiosa made their first contribution in #1699
- @earlytobed made their first contribution in #1724
- @torymur made their first contribution in #1732
Full Changelog: v0.21.0...v0.21.1rc0
Release v0.21.0
Release v0.20.4 v0.21.0
- More cache options. by @Narsil in #1675
- Disable caching for long strings. by @Narsil in #1676
- Testing ABI3 wheels to reduce number of wheels by @Narsil in #1674
- Adding an API for decode streaming. by @Narsil in #1677
- Decode stream python by @Narsil in #1678
- Fix encode_batch and encode_batch_fast to accept ndarrays again by @diliop in #1679
We also no longer support python 3.7 or 3.8 (similar to transformers) as they are deprecated.
Full Changelog: v0.20.3...v0.21.0
v0.20.3
What's Changed
There was a breaking change in 0.20.3
for tuple inputs of encode_batch
!
- fix pylist by @ArthurZucker in #1673
- [MINOR:TYPO] Fix docstrings by @cakiki in #1653
New Contributors
Full Changelog: v0.20.2...v0.20.3
v0.20.2
Release v0.20.2
Thanks a MILE to @diliop we now have support for python 3.13! 🥳
What's Changed
- Bump cookie and express in /tokenizers/examples/unstable_wasm/www by @dependabot in #1648
- Fix off-by-one error in tokenizer::normalizer::Range::len by @rlanday in #1638
- Arg name correction: auth_token -> token by @rravenel in #1621
- Unsound call of
set_var
by @sftse in #1664 - Add safety comments by @Manishearth in #1651
- Bump actions/checkout to v4 by @tinyboxvk in #1667
- PyO3 0.22 by @diliop in #1665
- Bump actions versions by @tinyboxvk in #1669
New Contributors
- @rlanday made their first contribution in #1638
- @rravenel made their first contribution in #1621
- @sftse made their first contribution in #1664
- @Manishearth made their first contribution in #1651
- @tinyboxvk made their first contribution in #1667
- @diliop made their first contribution in #1665
Full Changelog: v0.20.1...v0.20.2