Commit Graph

3664 Commits

Author SHA1 Message Date
TensorRT LLM
fd9916424f [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-18 03:23:16 +00:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM (#8256)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
QI JUN
c3376fa114
[None][ci] split speculative test case into several small cases (#9209)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-17 17:02:25 -08:00
Lucas Liebenwein
6d0a8edbbb
[None][chore] local imports for AutoDeploy in serve and bench (#9199)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-18 08:14:32 +08:00
zackyoray
e3c9a97075
[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection (#9075)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-11-17 15:39:55 -08:00
TensorRT LLM
2d6289b4b4 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-17 22:26:06 +00:00
yuanjingx87
ec36a3af7e
[None][infra] Fix lock file generation script (#9180)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-17 11:53:56 -08:00
Matt Lefebvre
470d777744
[TRTINFRA-7280][infra] Support enroot/pyxis clusters in multi-node SLURM and enable oci-hsg GB200 in post-merge (#9117)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-17 10:59:30 -08:00
Robin Kobus
df41f220a2
[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-17 18:07:13 +01:00
Mike Iovine
6151a4c9d6
[None][feat] Add simple optimizations for MTP 2-model (#9176)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-17 10:05:39 -05:00
Yiqing Yan
24f5cd7493
[TRTLLM-8000][infra] Catch error in merge waive list stage (#7289)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-17 13:28:50 +08:00
Kaiyu Xie
04be5a704e
[None] [fix] Fix missing ActivationType issue (#9171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-17 10:43:25 +08:00
Anthony Chang
86cfb3ea7e
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-17 10:04:29 +08:00
Jinyang Yuan
6dc70aa0e5
[https://nvbugs/5613089][fix] Fix the rank to access all_rank_chunk_size_list when chunked MoE is used (#8723)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-11-17 10:01:08 +08:00
Emma Qiao
d16b1a84c5
[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-17 09:36:56 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975)
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
Guoming Zhang
e0f69657c7
[None][fix] Update the attention layers counting for Qwen3-next. (#9072)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-16 11:52:56 -08:00
Emma Qiao
2854f0cf3d
[None][infra] Waive failed tests for main branch 11/15 (#9187)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-16 01:48:25 -08:00
brb-nv
63237494db
[None][chore] Waive failing tests blocking pre-merge (#9189)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-16 01:06:03 -08:00
JadoTu
3cde84581d
[None][fix] Make the sliced nvfp4 output contiguous (#9123)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-15 20:00:54 +08:00
Thor Johnsen
64cd91ae0a
[None][infra] Add trt-llm-kv-cache-manager-devs as code owner for appropriate files (#9182)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-11-15 16:46:14 +08:00
Erin
fe69243157
[None][chore] Add placement test for ray executor (#9122)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-14 23:10:59 -08:00
Zhanrui Sun
bdcf837784
[TRTLLM-9079][infra] upgrade tritonserver DLFW 25.10 (#8929)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-14 20:22:10 -08:00
yuanjingx87
83122bfd64
[None][infra] Update allowlist 2025.11.14 (#9183)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 16:29:26 -08:00
yuanjingx87
73b8783903
[None][infra] Fix medata.json generated by lock file genreation pipeline (#9179)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 12:28:20 -08:00
TensorRT LLM
cbabdae57d [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-14 18:54:51 +00:00
yuanjingx87
05b5336ab6
[None][infra] Lock generation pipeline update (#9084)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 10:12:25 -08:00
Chang Liu
bed4e95e9f
[https://nvbugs/5629887][fix] Add missing device count guard for DSv32 multiGPU tests (#9159) 2025-11-14 07:52:23 -08:00
xinhe-nv
49b7e6301a
[None][chore] Add failed cases into waives.txt (#9156)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-14 06:28:22 -08:00
mpikulski
80bf840e69
[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… (#9146)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-14 11:36:22 +01:00
yuanjingx87
d72321a32e
[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling (#9161)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 01:49:26 -08:00
Chenghao Zhang
f6f6e1f25d
[#9102][feat] AutoDeploy: Support fp8 kv cache (#9107)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 23:55:45 -08:00
Zero Zeng
c6cce398f5
[TRTLLM-9053][feat] Support accuracy test and install from wheel (#9038)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-13 23:34:47 -08:00
dongxuy04
84483a238a
[None][doc] update docs for EPLB (#9166)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-13 22:24:29 -08:00
Fanrong Li
25bd2e6917
[None][doc] Add DeepSeek-V3.2-Exp document (#9141)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-13 22:01:58 -08:00
Lizhi Zhou
8bd779171e
[https://nvbugs/5631254][fix] avoid torch.compile for multiple times (#9135)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-13 21:49:52 -08:00
TensorRT LLM
e90dbaf572 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-14 03:40:28 +00:00
Suyog Gupta
d12cb9436d
[None][feat] Autodeploy add triton configs and optimize mamba prefill (#9083)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-13 19:15:43 -08:00
QI JUN
3c950910a0
[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] (#9162)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-13 18:56:37 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module (#8682)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
JunyiXu-nv
fdb0787e85
[None][chore] Support json_schema in response_format (#8934)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-14 09:43:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Neta Zmora
34dc6869f3
[#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011)
Update TRTLLM Cutlass MoE kernels with ReLU2 activation.

Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function.
The PR adds this and adds an API to set the activation function, in general.
The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954.

The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from
Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`).

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 16:54:45 -08:00
dongxuy04
a370643b26
[None][fix] support topk autotuner input for expert slot per group larger than 32 (#9087)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-14 08:37:20 +08:00
Leslie Fang
daa31d78f4
[https://nvbugs/5652552][fix] Log the llm args for main branch (#9120)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-11-14 07:43:21 +08:00
Frida Hou
b51258acdd
[None][autodeploy] fix weight extraction for graph based quantized checkpoints (#9109)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:14:24 -08:00
Frida Hou
e96a3d294d
[None][autodeploy] minor refactor to rmsnorm transforms (#8657)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:13:58 -08:00
Jinyang Yuan
12f339f3bf
[None][fix] Fix the aux_stream in Llama4MinLatencyFusedMoE (#9035)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-11-13 09:09:52 -08:00
Iman Tabrizian
9ef7eb70e0
[None][fix] Fix KV cache manager test warnings (#9103) 2025-11-13 07:23:04 -08:00