[Feature] VLM model support tp #206

KerwinKai · 2025-09-01T12:34:15Z

Motivation

support tp for qwen2.5 vl, the gpu memory is 78.62GB in tp1, 43.56GB in tp4.

Modifications

add qwen2_5_vl.py for target model
add QKVParallelLinear for linear.py, because Qwen2_5_VLVisionAttention class need it.

Related Issues

#166

Pedding todo

accuracy test
support tp8, because num_attention_heads in config.json can not be divide by 8.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

oswen · 2025-09-11T13:03:56Z

The lm_head is column parallel, but you did not perform a gather operation here.

init commit

768f21e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] VLM model support tp #206

[Feature] VLM model support tp #206

Uh oh!

KerwinKai commented Sep 1, 2025

Uh oh!

oswen commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] VLM model support tp #206

Are you sure you want to change the base?

[Feature] VLM model support tp #206

Uh oh!

Conversation

KerwinKai commented Sep 1, 2025

Motivation

Modifications

Related Issues

Pedding todo

Checklist

Uh oh!

oswen commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants