Releases · remsky/Kokoro-FastAPI

18 Jun 22:27

github-actions

v0.2.4

6b1e9d9

Release v0.2.4 Latest

Latest

What's Changed

docs: added a note for Apple Silicon users regarding GPU build by @FotieMConstant in #232
converted CRLF ending lines to LF ones in api/src/structures/custom_responses.py by @blakkd in #199
Fixes and Features by @fireblade2534 in #235
Add Direct Windows support scripts by @kimnzl in #241
Added support for MPS on Apple silicon by @rampadc in #233
use local js file instead of the unpkg cdn by @mpnsk in #244
Fixes not returning a download link if streaming is off and return_download_link is true by @fireblade2534 in #240
Segfault fixes by @fireblade2534 in #253
Fixes relating to parsing money and tests. Also readme stuff by @fireblade2534 in #256
Fix Helm charts health check, ingress, and values by @richardr1126 in #257
start-gpu_mac.sh: removed duplicated env and align with other shell scripts by @rampadc in #266
Maintenance/automations by @remsky in #276
Update Dockerfile to install Rust by @RigleGit in #291
Fixing normalization by @fireblade2534 in #303
Fixed phenomes by @fireblade2534 in #304
Added some better saftey checks to captioned speech by @fireblade2534 in #310
feat(text): add Chinese punctuation-based sentence splitting for bett… by @jiaohuix in #321
Improve Audio Pause Handling, MP3 Encoding, and Robust Text Normalization/Splitting by @mylukin in #322
Update paths.py by @mbailey in #311
Add a volume multiplier setting by @JCallicoat in #316
Release by @fireblade2534 in #339

New Contributors

@FotieMConstant made their first contribution in #232
@blakkd made their first contribution in #199
@kimnzl made their first contribution in #241
@rampadc made their first contribution in #233
@mpnsk made their first contribution in #244
@RigleGit made their first contribution in #291
@jiaohuix made their first contribution in #321
@mylukin made their first contribution in #322
@mbailey made their first contribution in #311
@JCallicoat made their first contribution in #316

Full Changelog: v0.2.3...v0.2.4

Contributors

mbailey, JCallicoat, and 11 other contributors

Assets 2

07 Mar 05:26

github-actions

v0.2.3

a578d22

v0.2.3

What's Changed

Disable --reload on unicorn/fastapi to avoid pegging a CPU core by @randombk in #171
Add a .gitattributes by @fireblade2534 in #186
Normalization changes by @fireblade2534 in #179
Streaming word timestamps by @fireblade2534 in #173
Fix low quality because audio was being encoded at a lower bitrate by @fireblade2534 in #207

New Contributors

@randombk made their first contribution in #171

Full Changelog: v0.2.2...v0.2.3pre

Contributors

randombk and fireblade2534

Assets 2

13 Feb 09:49

github-actions

v0.2.2

cfae7db

v0.2.2

Fixes

speak not engaging reliably on the CPU image as a fallback
audio quality bumped up by adjusting compression settings, bug with webui format selection
advanced normalization settings added @fireblade2534

What's Changed

Add Helm chart by @zucher in #157 #162
fixed a bunch of stuff by @fireblade2534 in #152
added settings based override of default lang_code by @Krurst in #155
docs update @eltociear in #156

New Contributors

@zucher made their first contribution in #157
@Krurst made their first contribution in #155
@eltociear made their first contribution in #156

Full Changelog: v0.2.1...v0.2.2

Contributors

zucher, eltociear, and 2 other contributors

Assets 2

10 Feb 05:59

github-actions

v0.2.1

cc4d5ac

v0.2.1

What's Changed

adjustment to improve compatibility with espeak-loader dependency on misaki #127
added v1/models dummy endpoint for compatibility #144
fixed issue with duplicates captions, swapping to a stream on audio + tempfile download at completion for caption files #139
fixed some problems in the build system and model download system by @fireblade2534 in #131

Full Changelog: v0.2.0...v0.2.1

Contributors

fireblade2534

Assets 2

07 Feb 11:23

remsky

v0.2.0

bfdb5c0

v0.2.0

Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
  - English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
  - Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes

What's Changed

Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
Bumping PyTorch version to 2.6.0, CUDA 12.4
Adjustments to Docker workflows + Incorporating Docker Bake

Contributors

Full Changelog: v0.1.4...v0.2.0

Contributors

JoshRosen, eschmidbauer, and 5 other contributors

Assets 2

31 Jan 09:06

github-actions

v0.1.4

8156b29

v0.1.4

Changes to simplify streaming/async inference pathways still somewhat in progress.
WebUI added as a lighter-weight alternative to the Gradio UI
More of the configuration variables are exposed, temporary file management settings
Added new debug endpoints for system and storage information (threads, sessions, etc)
Significant restructuring towards concurrency, decoupling inference workflows, more flexibility

What's Changed

Update README.md with new local endpoint usage example by @jteijema in #50
Update UI access with environment URL and PORT by @jteijema in #51
Fixed python tests by @fireblade2534 in #69
Try to add AAC audio format w/ updated test by @richardr1126 in #74
Fixed thread leak because of excessive E-speak backends by @fireblade2534 in #87
Fix truncated playback issue in streaming WAV responses by @JoshRosen in #94
Fixes auto downloading models by @fireblade2534 in #99
V0.1.4 by @remsky in #102
V0.1.4 - CI updates by @remsky in #104

New Contributors

@jteijema made their first contribution in #50
@richardr1126 made their first contribution in #74
@JoshRosen made their first contribution in #94

Full Changelog: v0.1.0...v0.1.4

Contributors

JoshRosen, remsky, and 3 other contributors

Assets 4

14 Jan 15:25

github-actions

v0.1.0

880fa7a

v0.1.0

What's Changed

Potentially Breaking Changes
- Swapped to uv dependency management from pip
- Baked model files and voicepacks directly into gpu + cpu images
- latest-slim tags could use some community testing, but will be optimizing and checking on deployability
- Location of dockerfiles + docker compose has been moved into the docker directory. Be sure to check the paths when launching

UI Changes:
- Multi-select and merging of voices has been enabled.
- An environment flag was set to disable local saving/filepath operations. By default it should still be saving locally
- Made the waveform a dynamic blue color
API Changes
- Simplified audio normalization, more stable (likely won't notice a difference as the end user)
- Streaming now respects broken connections, will stop processing on the next chunk
- Minor/Moderate GPU memory handling cleanup and safeties added (clearing intermediate tensors, adding pressure warning, etc)
CI/CD live on Github Actions
- Pytest will run through all API tests on any pull requests now. You can modify them to align with new functionality, and add as needed but try not to lose any coverage, makes my life a bit easier
- Pytorch mocks mostly removed, run on CPU version for automated testing.

This has been a great model to work with. Looking forward to when the new 0.24 version is released by https://huggingface.co/hexgrad/Kokoro-82M.

Be sure to check their page out out for updates on model development, and keep in mind they're always looking for more data

New Contributors

@Galunid made their first contribution in #32

Full Changelog: v0.0.5...v0.1.0

Contributors

Galunid

Assets 20

13 Jan 06:46

github-actions

v0.0.5post1

1e45a31

v0.0.5post1

What's Changed

fix: Add missing healthcheck dependency (curl) by @Galunid in #32
Minor docker tagging and configuration changes
Gradio & gpu memory management bug fix
Bonus Voice Pack attached af_irulan: drag and drop into your api/voices folder

New Contributors

@Galunid made their first contribution in #32

Full Changelog: v0.0.5...v0.0.5post1

Contributors

Galunid

Assets 3

12 Jan 13:47

github-actions

v0.1.0-pre

d2522bc

v0.1.0-pre Pre-release

Pre-release

Initial swap of dependency management to uv to simplify testing and deployments
Dropping model-fetcher container & baking models directly into docker images
Standardizing tagging to allow for consistent usage of latest tag across architectures
Minor structural changes towards accommodating incoming custom Voice Mixer module

Full Changelog: v0.0.5...v0.1.0-pre

Assets 2

11 Jan 05:20

github-actions

v0.0.5

22c52fd

v0.0.5

Stabilized issues with images tagging and structures from v0.0.4
Added automatic master to develop branch synchronization
Improved release tagging and structures
Initial CI/CD setup

Full Changelog: v0.0.4...v0.0.5

Assets 2

Uh oh!

Releases: remsky/Kokoro-FastAPI

Release v0.2.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.2

Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!

v0.2.0

Contributors

Uh oh!

v0.1.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.5post1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0-pre

Uh oh!

v0.0.5

Uh oh!