-
Notifications
You must be signed in to change notification settings - Fork 706
FEAT: Kimi-VL-A3B #3372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FEAT: Kimi-VL-A3B #3372
Conversation
Just found this info after I failed to try the Kimi-VL by Xinference. Is there any plan to make it merged? |
I just tried in the xinference latest docker with cmd below, and seems that it works (on 1 H20 96G): python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model /workspace/modelscope/hub/models/moonshotai/Kimi-VL-A3B-Thinking/ --tensor-parallel-size 1 --max-num-batched-tokens 131072 --max-model-len 131072 --max-num-seqs 512 --limit-mm-per-prompt image=256 |
80% WIP, waiting for a refactor to multimodal LLM engine |
Hi, @Minamiyama . Please rebase the code to the latest main branch. The key point is: You don’t need to focus on adapting to Xinference’s output format—just concentrate on implementing stream generation correctly. |
Going further, if you're interested, you could explore integrating this model into a continuous batching-supported framework in the future. This would require implementing additional interfaces—refer to |
got it, thx |
# Conflicts: # xinference/model/llm/llm_family.json # xinference/model/llm/llm_family_modelscope.json
Add support for Kimi-VL-A3B-Instruct and Kimi-VL-A3B-Thinking-2506 vision-language models with multimodal reasoning capabilities
No description provided.