GitHub · Where software is built

[RFC]Feedback collection about TensorRT-LLM 1.0 Release Planning and API Compatibility Commitment
#3148 · juney-nvidia opened on Mar 29, 2025
2
[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups
#3124 · juney-nvidia opened on Mar 27, 2025
10

Labels Milestones New issue

[FEAT] AFD attention ffn disaggregation

Disaggregated Serving

#6577

· chongxing opened

on Aug 3, 2025

Failure to run nvfp4 GLM 4.5

#6569

· laikhtewari opened

on Aug 2, 2025

Non-Autoregressive Model Support

#6551

· poor1017 opened

on Aug 1, 2025

Long input sequence with TRTLLM-serve

#6539

· vinhngx opened

on Aug 1, 2025

[BUG] Code for producing DeepSeek-R1 W4AFP8

#6530

· matkle opened

on Jul 31, 2025

[FEAT] Add option to link statically with cublas&curand

feature request

#6529

· WilliamTambellini opened

on Jul 31, 2025

allow triggering CI dashboard on a specific AD branch

#6517

· MrGeva opened

on Jul 31, 2025

Switch AD to use --kv_cache_free_gpu_mem_fraction as the PT BE, instead of free_mem_ratio

KV-Cache Management

#6516

· MrGeva opened

on Jul 31, 2025

Qwen3Moe quantized checkpoint loading error

#6513

· Wokzy opened

on Jul 31, 2025

[BUG][Disaggregated] Wrong outputs when prefill/decode uses different tp_size

Disaggregated Serving

#6507

· ZhangGe6 opened

on Jul 31, 2025

Using max_attention_window (VSWA) reduces concurrent batch size and causes drop in throughput (gemma3 trt backend)

#6503

· lkm2835 opened

on Jul 31, 2025

git clone code on windows, raise filename too long error.

#6473

· knote2019 opened

on Jul 30, 2025