Skip to content

Conversation

KerwinKai
Copy link
Contributor

Motivation

support tp for qwen2.5 vl, the gpu memory is 78.62GB in tp1, 43.56GB in tp4.

Modifications

  • add qwen2_5_vl.py for target model

  • add QKVParallelLinear for linear.py, because Qwen2_5_VLVisionAttention class need it.

Related Issues

#166

Pedding todo

  • accuracy test

  • support tp8, because num_attention_heads in config.json can not be divide by 8.

Checklist

@oswen
Copy link

oswen commented Sep 11, 2025

Clipboard_Screenshot_1757595669 The lm_head is column parallel, but you did not perform a gather operation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants