NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 10.7k

Code
Issues 630
Pull requests 260
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 45 Milestones 0

New pull request New

260 Open 1,966 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[doc] Add speculative decoding PyTorch docs

#4962 opened Jun 5, 2025 by mikeiovine

Loading…

Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs

#4961 opened Jun 5, 2025 by moraxu

Loading…

Doc: Add info about stop words appearing in output

#4956 opened Jun 5, 2025 by Linda-Stadter

Loading…

Add customized renormalized moe routing kernel for moe cutlass backend

#4955 opened Jun 5, 2025 by ChristinaZ

Loading…

[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness

#4954 opened Jun 5, 2025 by tomeras91

Loading…

[fix] Add MPI barrier for proper NVLS multicast team initialization

#4952 opened Jun 5, 2025 by ikryukov

Loading…

ci: [nvbugs/5280806] Unwaive unittests/_torch.

#4951 opened Jun 5, 2025 by yuxianq

Loading…

[WIP] Introduce Flux MoE operator

#4948 opened Jun 5, 2025 by lancelly • Draft

[fix] Fix illegal mem access and possible accuracy lose

#4943 opened Jun 5, 2025 by liji-nv

Loading…

[TRTLLM-5512] - Move part of tests from A100X to A100_80GB_PCIE multi…

#4942 opened Jun 5, 2025 by yiqingy0

Loading…

fix: Refactor the first token response in PD (#4692)

#4941 opened Jun 5, 2025 by Shunkangz

Loading…

[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow

#4940 opened Jun 5, 2025 by crazydemo

Loading…

User/zhanruis/0605 test use build stage wheels for release

#4939 opened Jun 5, 2025 by ZhanruiSunCh

Loading…

Perf: cache tokens in Python side to reduce reading overhead of pybind

#4938 opened Jun 5, 2025 by QiJune

Loading…

Raise shut down error for each request

#4936 opened Jun 5, 2025 by Shunkangz • Draft

fix: Mapping rank boundary check bug

#4935 opened Jun 5, 2025 by venkywonka

Loading…

[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests

#4933 opened Jun 5, 2025 by moraxu

Loading…

fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor.

#4930 opened Jun 5, 2025 by yuxianq

Loading…

Kv cache transfer support duplicate heads

#4929 opened Jun 5, 2025 by chuangz0

Loading…

chore: cleanup GDS Cmake interface

#4928 opened Jun 5, 2025 by achartier

Loading…

fix:https://nvbugs/5324252

#4925 opened Jun 5, 2025 by nv-guomingz

Loading…

fix: trtllm-bench --dataset required=True

#4924 opened Jun 5, 2025 by jasonqinzhou

Loading…

Coalesce text diffs in streaming requests.

#4923 opened Jun 5, 2025 by pathorn

Loading…

fix: [nvbug 5321627] handle cases when TRT backend return more logits than output tokens

#4921 opened Jun 4, 2025 by hchings

Loading…

chore: Refactor apply_rope.

#4918 opened Jun 4, 2025 by bobboli

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-06-02.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!