[AutoDeploy] Weight Fusion Revisited #4674

lucaslie · 2025-05-27T00:12:34Z

fuse_gemm transformation at the moment consumes a lot of memory causing OOM errors. The fuse_gemm is disabled for now and we should fix it eventually

lucaslie · 2025-05-29T05:26:21Z

This affects both fuse_gemm and fuse_moe. Possibly hard to fix if everything is done on the GPU due to memory fragmentation from re-allocating the fused weights...

Related discussion: https://nvidia.slack.com/archives/C08T55LHSG4/p1748491530809259

lucaslie added bug Something isn't working AutoDeploy labels May 27, 2025

lucaslie assigned suyoggupta May 27, 2025

lucaslie self-assigned this May 29, 2025

lucaslie changed the title ~~[AutoDeploy] fix fuse_gemm OOM issue~~ [AutoDeploy] Weight Fusion Revisited May 29, 2025

Copilot AI mentioned this issue May 30, 2025

[AutoDeploy] transformation logging and disabled fuse_moe nv-auto-deploy/TensorRT-LLM#49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoDeploy] Weight Fusion Revisited #4674

[AutoDeploy] Weight Fusion Revisited #4674

lucaslie commented May 27, 2025

lucaslie commented May 29, 2025

Uh oh!

[AutoDeploy] Weight Fusion Revisited #4674

[AutoDeploy] Weight Fusion Revisited #4674

Comments

lucaslie commented May 27, 2025

lucaslie commented May 29, 2025

Uh oh!