-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: NVIDIA/TensorRT-LLM
[RFC]Feedback collection about TensorRT-LLM 1.0 Release Plann...
#3148
opened Mar 29, 2025 by
juney-nvidia
Open
2
[RFC]Topics you want to discuss with TensorRT-LLM team in the...
#3124
opened Mar 27, 2025 by
juney-nvidia
Open
9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[AutoDeploy] Expose logit_softcap in torch attention reference ops
AutoDeploy
#4881
opened Jun 3, 2025 by
lucaslie
[AutoDeploy] Expose logit_softcap parameter in flashinfer_attention
AutoDeploy
#4880
opened Jun 3, 2025 by
lucaslie
[Profiling] Is there any good way to profile the CPU runtime of trtllm?
#4866
opened Jun 3, 2025 by
JuniperHugh
[AutoDeploy] Investigate DemoLLM Token Generation
AutoDeploy
bug
Something isn't working
#4841
opened Jun 2, 2025 by
lucaslie
Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4825
opened Jun 2, 2025 by
bebilli
2 of 4 tasks
Driver crash during warmup of DeepSeek-R1-FP4
bug
Something isn't working
#4816
opened May 31, 2025 by
pathorn
1 of 4 tasks
The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4815
opened May 31, 2025 by
Alireza3242
1 of 4 tasks
[Bug] Users need to add Something isn't working
cuda_graph_max_batch_size=0
to avoid crash when config from extra-llm-api-config.yml
bug
#4811
opened May 30, 2025 by
chang-l
4 tasks
Inconsistent output_log_probs with concurrent requests at beam_width and max_batch_size ≥ 2
bug
Something isn't working
#4793
opened May 30, 2025 by
wonjkim
4 tasks
Gemma-2 Style Attention Pattern Matching with logit softcap
AutoDeploy
#4789
opened May 30, 2025 by
lucaslie
llmapi usage: how to add callback after each step and embedding table in LLM.generate_async
#4788
opened May 30, 2025 by
bnuzhanyu
Feature support: eagle multimodal inputs
feature request
New feature or request. This includes new model, dtype, functionality support
#4787
opened May 30, 2025 by
liyi-xia
Patch for create_causal_mask() function in transformers masking_utils.py
AutoDeploy
#4783
opened May 30, 2025 by
sugunav14
Retouch ccp executor example cmake to enable or not multi device building
#4770
opened May 29, 2025 by
WilliamTambellini
How is the performance of the model with pytorch as the backend
Investigating
Performance
TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.
triaged
Issue has been triaged by maintainers
#4745
opened May 29, 2025 by
oppolll
Test gemma models after upgrade to latest transformers
AutoDeploy
#4740
opened May 28, 2025 by
sugunav14
DeepSeek-R1-FP4 crashes when MTP is enabled
bug
Something isn't working
#4708
opened May 27, 2025 by
Shang-Pin
1 of 4 tasks
[AutoDeploy] Investigate torch.export as a preprocessing step to InferenceOptimizer
AutoDeploy
#4704
opened May 27, 2025 by
lucaslie
PluginConfig
object has no attribute _paged_kv_cache
question
#4701
opened May 27, 2025 by
Yoloex
4 tasks
[AutoDeploy] Weight Fusion Revisited
AutoDeploy
bug
Something isn't working
#4674
opened May 27, 2025 by
lucaslie
Error in dispatch_kv_cache_events_thread: 'NoneType' object has no attribute 'new_value'
bug
Something isn't working
#4666
opened May 26, 2025 by
Rb-Ach
4 tasks
Support for Devstral with pytorch backend
triaged
Issue has been triaged by maintainers
#4653
opened May 26, 2025 by
ankitmaurya001
Previous Next
ProTip!
Follow long discussions with comments:>50.