Feature support: eagle multimodal inputs #4787

liyi-xia · 2025-05-30T03:15:46Z

Hi,

Currently lookhead has already supported multimodal inputs while eagle has not. As eagle's input for embedding layer is not 2-D so when batch size > 1, it cannot expand correctly.

TensorRT-LLM/tensorrt_llm/layers/embedding.py

Line 182 in c6f7d42

tasks = expand(tasks, shape(prompt_tokens))

May I request for new feature that eagle supports multimodal input.

liyi-xia · 2025-06-02T04:11:28Z

Hello, I have already hacked this feature under the help of TRT-LLM team member Ruoqian. We modified the implementation of expand function and changed the embedding layer of base model to be promptembedding when building the engine.

However, I found in TRT-LLM 0.19.0, the behaviour and generation quality is much worse than TRT-LLM 0.17.0, no matter it is eagle 1 or eagle 2. May I know if any other users report this?

hchings assigned laikhtewari May 30, 2025

hchings added the feature request New feature or request. This includes new model, dtype, functionality support label May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature support: eagle multimodal inputs #4787

Feature support: eagle multimodal inputs #4787

liyi-xia commented May 30, 2025

liyi-xia commented Jun 2, 2025

Uh oh!

Feature support: eagle multimodal inputs #4787

Feature support: eagle multimodal inputs #4787

Comments

liyi-xia commented May 30, 2025

liyi-xia commented Jun 2, 2025

Uh oh!