Releases: remsky/Kokoro-FastAPI
Release v0.2.4
What's Changed
- docs: added a note for Apple Silicon users regarding GPU build by @FotieMConstant in #232
- converted CRLF ending lines to LF ones in api/src/structures/custom_responses.py by @blakkd in #199
- Fixes and Features by @fireblade2534 in #235
- Add Direct Windows support scripts by @kimnzl in #241
- Added support for MPS on Apple silicon by @rampadc in #233
- use local js file instead of the unpkg cdn by @mpnsk in #244
- Fixes not returning a download link if streaming is off and return_download_link is true by @fireblade2534 in #240
- Segfault fixes by @fireblade2534 in #253
- Fixes relating to parsing money and tests. Also readme stuff by @fireblade2534 in #256
- Fix Helm charts health check, ingress, and values by @richardr1126 in #257
- start-gpu_mac.sh: removed duplicated env and align with other shell scripts by @rampadc in #266
- Maintenance/automations by @remsky in #276
- Update Dockerfile to install Rust by @RigleGit in #291
- Fixing normalization by @fireblade2534 in #303
- Fixed phenomes by @fireblade2534 in #304
- Added some better saftey checks to captioned speech by @fireblade2534 in #310
- feat(text): add Chinese punctuation-based sentence splitting for bett… by @jiaohuix in #321
- Improve Audio Pause Handling, MP3 Encoding, and Robust Text Normalization/Splitting by @mylukin in #322
- Update paths.py by @mbailey in #311
- Add a volume multiplier setting by @JCallicoat in #316
- Release by @fireblade2534 in #339
New Contributors
- @FotieMConstant made their first contribution in #232
- @blakkd made their first contribution in #199
- @kimnzl made their first contribution in #241
- @rampadc made their first contribution in #233
- @mpnsk made their first contribution in #244
- @RigleGit made their first contribution in #291
- @jiaohuix made their first contribution in #321
- @mylukin made their first contribution in #322
- @mbailey made their first contribution in #311
- @JCallicoat made their first contribution in #316
Full Changelog: v0.2.3...v0.2.4
v0.2.3
What's Changed
- Disable --reload on unicorn/fastapi to avoid pegging a CPU core by @randombk in #171
- Add a .gitattributes by @fireblade2534 in #186
- Normalization changes by @fireblade2534 in #179
- Streaming word timestamps by @fireblade2534 in #173
- Fix low quality because audio was being encoded at a lower bitrate by @fireblade2534 in #207
New Contributors
Full Changelog: v0.2.2...v0.2.3pre
v0.2.2
Fixes
- speak not engaging reliably on the CPU image as a fallback
- audio quality bumped up by adjusting compression settings, bug with webui format selection
- advanced normalization settings added @fireblade2534
What's Changed
- Add Helm chart by @zucher in #157 #162
- fixed a bunch of stuff by @fireblade2534 in #152
- added settings based override of default lang_code by @Krurst in #155
- docs update @eltociear in #156
New Contributors
- @zucher made their first contribution in #157
- @Krurst made their first contribution in #155
- @eltociear made their first contribution in #156
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- adjustment to improve compatibility with espeak-loader dependency on misaki #127
- added v1/models dummy endpoint for compatibility #144
- fixed issue with duplicates captions, swapping to a stream on audio + tempfile download at completion for caption files #139
- fixed some problems in the build system and model download system by @fireblade2534 in #131
Full Changelog: v0.2.0...v0.2.1
v0.2.0
- Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
- English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
- Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
- Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
- Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes
What's Changed
- Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
- Bumping PyTorch version to 2.6.0, CUDA 12.4
- Adjustments to Docker workflows + Incorporating Docker Bake
Contributors
Full Changelog: v0.1.4...v0.2.0
v0.1.4
- Changes to simplify streaming/async inference pathways still somewhat in progress.
- WebUI added as a lighter-weight alternative to the Gradio UI
- More of the configuration variables are exposed, temporary file management settings
- Added new debug endpoints for system and storage information (threads, sessions, etc)
- Significant restructuring towards concurrency, decoupling inference workflows, more flexibility
What's Changed
- Update README.md with new local endpoint usage example by @jteijema in #50
- Update UI access with environment URL and PORT by @jteijema in #51
- Fixed python tests by @fireblade2534 in #69
- Try to add AAC audio format w/ updated test by @richardr1126 in #74
- Fixed thread leak because of excessive E-speak backends by @fireblade2534 in #87
- Fix truncated playback issue in streaming WAV responses by @JoshRosen in #94
- Fixes auto downloading models by @fireblade2534 in #99
- V0.1.4 by @remsky in #102
- V0.1.4 - CI updates by @remsky in #104
New Contributors
- @jteijema made their first contribution in #50
- @richardr1126 made their first contribution in #74
- @JoshRosen made their first contribution in #94
Full Changelog: v0.1.0...v0.1.4
v0.1.0
What's Changed
- Potentially Breaking Changes
- Swapped to
uvdependency management from pip - Baked model files and voicepacks directly into gpu + cpu images
latest-slimtags could use some community testing, but will be optimizing and checking on deployability- Location of dockerfiles + docker compose has been moved into the
dockerdirectory. Be sure to check the paths when launching
- Swapped to
-
UI Changes:
- Multi-select and merging of voices has been enabled.
- An environment flag was set to disable local saving/filepath operations. By default it should still be saving locally
- Made the waveform a dynamic blue color
-
API Changes
- Simplified audio normalization, more stable (likely won't notice a difference as the end user)
- Streaming now respects broken connections, will stop processing on the next chunk
- Minor/Moderate GPU memory handling cleanup and safeties added (clearing intermediate tensors, adding pressure warning, etc)
-
CI/CD live on Github Actions
- Pytest will run through all API tests on any pull requests now. You can modify them to align with new functionality, and add as needed but try not to lose any coverage, makes my life a bit easier
- Pytorch mocks mostly removed, run on CPU version for automated testing.
This has been a great model to work with. Looking forward to when the new 0.24 version is released by https://huggingface.co/hexgrad/Kokoro-82M.
Be sure to check their page out out for updates on model development, and keep in mind they're always looking for more data
New Contributors
Full Changelog: v0.0.5...v0.1.0
v0.0.5post1
What's Changed
-
fix: Add missing healthcheck dependency (curl) by @Galunid in #32
-
Minor docker tagging and configuration changes
-
Gradio & gpu memory management bug fix
-
Bonus Voice Pack attached
af_irulan: drag and drop into your api/voices folder
New Contributors
Full Changelog: v0.0.5...v0.0.5post1
v0.1.0-pre
- Initial swap of dependency management to
uvto simplify testing and deployments - Dropping
model-fetchercontainer & baking models directly into docker images - Standardizing tagging to allow for consistent usage of
latesttag across architectures - Minor structural changes towards accommodating incoming custom Voice Mixer module
Full Changelog: v0.0.5...v0.1.0-pre
v0.0.5
- Stabilized issues with images tagging and structures from v0.0.4
- Added automatic master to develop branch synchronization
- Improved release tagging and structures
- Initial CI/CD setup
Full Changelog: v0.0.4...v0.0.5