Skip to content

feat: Add vLLM V1 support w/Unsloth model service #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bradhilton
Copy link
Collaborator

@bradhilton bradhilton commented Jul 1, 2025

Migrate the Unsloth model service to also support vLLM V1 which has some performance improvements and is the future of vLLM development.

@bradhilton
Copy link
Collaborator Author

There are a few current limitations with Unsloth Zoo that disallow V1 support. Generally, Unsloth Zoo does not support V1's collective RPC pattern yet. The collective RPC call to get the weight IPC handles failed with CUDA error: invalid argument. Also, the collective RPC calls do not check if the results are coroutines and so fail when called from AsyncLLM instances.

@corbt
Copy link
Contributor

corbt commented Jul 2, 2025

I'm not seeing any chatter on the Unsloth side about working towards this. How hard would it be to do it ourselves?

@bradhilton
Copy link
Collaborator Author

Hard to say, could take a while.

@bradhilton
Copy link
Collaborator Author

Probably will end up closing this if decoupling vLLM & Unsloth works out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants