-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs
#4961
opened Jun 5, 2025 by
moraxu
Loading…
Doc: Add info about stop words appearing in output
#4956
opened Jun 5, 2025 by
Linda-Stadter
Loading…
Add customized renormalized moe routing kernel for moe cutlass backend
#4955
opened Jun 5, 2025 by
ChristinaZ
Loading…
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness
#4954
opened Jun 5, 2025 by
tomeras91
Loading…
[fix] Add MPI barrier for proper NVLS multicast team initialization
#4952
opened Jun 5, 2025 by
ikryukov
Loading…
[fix] Fix illegal mem access and possible accuracy lose
#4943
opened Jun 5, 2025 by
liji-nv
Loading…
[TRTLLM-5512] - Move part of tests from A100X to A100_80GB_PCIE multi…
#4942
opened Jun 5, 2025 by
yiqingy0
Loading…
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow
#4940
opened Jun 5, 2025 by
crazydemo
Loading…
User/zhanruis/0605 test use build stage wheels for release
#4939
opened Jun 5, 2025 by
ZhanruiSunCh
Loading…
Perf: cache tokens in Python side to reduce reading overhead of pybind
#4938
opened Jun 5, 2025 by
QiJune
Loading…
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests
#4933
opened Jun 5, 2025 by
moraxu
Loading…
fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor.
#4930
opened Jun 5, 2025 by
yuxianq
Loading…
fix: [nvbug 5321627] handle cases when TRT backend return more logits than output tokens
#4921
opened Jun 4, 2025 by
hchings
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2025-06-02.