Skip to content

Feature support: eagle multimodal inputs #4787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
liyi-xia opened this issue May 30, 2025 · 1 comment
Open

Feature support: eagle multimodal inputs #4787

liyi-xia opened this issue May 30, 2025 · 1 comment
Assignees
Labels
feature request New feature or request. This includes new model, dtype, functionality support

Comments

@liyi-xia
Copy link

Hi,

Currently lookhead has already supported multimodal inputs while eagle has not. As eagle's input for embedding layer is not 2-D so when batch size > 1, it cannot expand correctly.

tasks = expand(tasks, shape(prompt_tokens))

May I request for new feature that eagle supports multimodal input.

@hchings hchings added the feature request New feature or request. This includes new model, dtype, functionality support label May 30, 2025
@liyi-xia
Copy link
Author

liyi-xia commented Jun 2, 2025

Hello, I have already hacked this feature under the help of TRT-LLM team member Ruoqian. We modified the implementation of expand function and changed the embedding layer of base model to be promptembedding when building the engine.

However, I found in TRT-LLM 0.19.0, the behaviour and generation quality is much worse than TRT-LLM 0.17.0, no matter it is eagle 1 or eagle 2. May I know if any other users report this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request. This includes new model, dtype, functionality support
Projects
None yet
Development

No branches or pull requests

3 participants