Skip to content

[AutoDeploy] Weight Fusion Revisited #4674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lucaslie opened this issue May 27, 2025 · 1 comment
Open

[AutoDeploy] Weight Fusion Revisited #4674

lucaslie opened this issue May 27, 2025 · 1 comment
Assignees
Labels
AutoDeploy bug Something isn't working

Comments

@lucaslie
Copy link
Member

fuse_gemm transformation at the moment consumes a lot of memory causing OOM errors. The fuse_gemm is disabled for now and we should fix it eventually

@lucaslie lucaslie added bug Something isn't working AutoDeploy labels May 27, 2025
@lucaslie
Copy link
Member Author

This affects both fuse_gemm and fuse_moe. Possibly hard to fix if everything is done on the GPU due to memory fragmentation from re-allocating the fused weights...

Related discussion: https://nvidia.slack.com/archives/C08T55LHSG4/p1748491530809259

@lucaslie lucaslie self-assigned this May 29, 2025
@lucaslie lucaslie changed the title [AutoDeploy] fix fuse_gemm OOM issue [AutoDeploy] Weight Fusion Revisited May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoDeploy bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants