Commit Graph

3693 Commits

Author SHA1 Message Date
TensorRT LLM
9135d580bf [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-19 03:25:00 +00:00
jellysnack
99ba723e20
[None][fix] logits device and shape issues in dynamic draft path (#9079)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
2025-11-18 19:22:47 -08:00
Ivy Zhang
782dfca7e8
[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error (#9172)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 18:26:32 -08:00
Grzegorz Kwasniewski
7905d6c0da
[#9098][feat] Simple sharding latent experts (#9099)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-18 21:14:22 -05:00
ChristinaZ
fbf6c16cd2
[None][fix] Update the default invalid value for deepseek mode of routing (#9222)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 10:14:06 +08:00
Grzegorz Kwasniewski
92f86a50d4
[#9137][feat] Factory sharding as default (#9144)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-18 21:12:03 -05:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
xinhe-nv
35658eab55
[None][chore] Add failed cases into waives.txt (#9193)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 17:47:55 -08:00
Enwei Zhu
7c4777a571
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-18 17:40:12 -08:00
Lizhi Zhou
c789000a62
[https://nvbugs/5649010][fix] increase status-checking interval to avoid instability (#9203)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-19 08:55:42 +08:00
Bo Deng
34f845bf69
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-11-18 14:46:20 -08:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example (#8434)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Ziyi Xiong
7c4344b92e
[https://nvbugs/5590408][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
Eran Geva
3ac11a6180
[#9152][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode (#9197)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-18 22:15:29 +02:00
Chenghao Zhang
f0b68e4c66
[None][feat] AutoDeploy: Perf improvement for small batch size (#9163)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-18 12:11:12 -08:00
Nikita Korobov
fe569f0594
[None][feat] bias for FP4 TRT-LLM Gen MoE (#9220)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-11-18 09:59:47 -08:00
mpikulski
04fb481da3
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-18 09:41:59 -08:00
Gal Hubara-Agam
36d3d8f608
[None][chore] Print device info in trtllm-bench report (#8584)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-11-18 09:00:10 -08:00
Kaiyu Xie
d076aa44d3
[None] [tests] Unwaive wide ep related tests (#9204)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-18 08:54:46 -08:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Ivy Zhang
160b361588
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 05:55:00 -08:00
Robin Kobus
9913dc25ae
[None][refactor] decoding inputs, part 2 (#5799)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-18 14:38:51 +01:00
Ivy Zhang
ca41a71f92
[TRTLLM-8948][test] Add long bench case (#9165)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 04:41:48 -08:00
Chang Liu
8e001dd195
[None][fix] DeepSeek V3.2 indexer RoPE fix (#9232)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-18 20:35:27 +08:00
Lizhi Zhou
07343bb11c
[None][chore] fix a deepseekv3 error when debug mode is on (#9217)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-18 01:14:32 -08:00
ruodil
82480346aa
[https://nvbugs/5652552][fix] add printing for llm args (#9205)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-17 23:58:36 -08:00
Zero Zeng
43896af1b1
[None][chore] benchmark refactor (#9207)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-17 23:29:28 -08:00
Stanley Sun
96cfdd8a72
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-17 22:02:24 -08:00
Gal Hubara-Agam
5e5300898b
[#8732][feat] Add ReLU2 to TRTLLM Cutlass MoE BF16 kernels (#9191)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-11-17 20:30:00 -08:00
TensorRT LLM
fd9916424f [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-18 03:23:16 +00:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM (#8256)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
QI JUN
c3376fa114
[None][ci] split speculative test case into several small cases (#9209)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-17 17:02:25 -08:00
Lucas Liebenwein
6d0a8edbbb
[None][chore] local imports for AutoDeploy in serve and bench (#9199)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-18 08:14:32 +08:00
zackyoray
e3c9a97075
[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection (#9075)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-11-17 15:39:55 -08:00
TensorRT LLM
2d6289b4b4 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-17 22:26:06 +00:00
yuanjingx87
ec36a3af7e
[None][infra] Fix lock file generation script (#9180)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-17 11:53:56 -08:00
Matt Lefebvre
470d777744
[TRTINFRA-7280][infra] Support enroot/pyxis clusters in multi-node SLURM and enable oci-hsg GB200 in post-merge (#9117)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-17 10:59:30 -08:00
Robin Kobus
df41f220a2
[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-17 18:07:13 +01:00
Mike Iovine
6151a4c9d6
[None][feat] Add simple optimizations for MTP 2-model (#9176)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-17 10:05:39 -05:00
Yiqing Yan
24f5cd7493
[TRTLLM-8000][infra] Catch error in merge waive list stage (#7289)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-17 13:28:50 +08:00
Kaiyu Xie
04be5a704e
[None] [fix] Fix missing ActivationType issue (#9171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-17 10:43:25 +08:00
Anthony Chang
86cfb3ea7e
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-17 10:04:29 +08:00
Jinyang Yuan
6dc70aa0e5
[https://nvbugs/5613089][fix] Fix the rank to access all_rank_chunk_size_list when chunked MoE is used (#8723)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-11-17 10:01:08 +08:00
Emma Qiao
d16b1a84c5
[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-17 09:36:56 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975)
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
Guoming Zhang
e0f69657c7
[None][fix] Update the attention layers counting for Qwen3-next. (#9072)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-16 11:52:56 -08:00
Emma Qiao
2854f0cf3d
[None][infra] Waive failed tests for main branch 11/15 (#9187)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-16 01:48:25 -08:00
brb-nv
63237494db
[None][chore] Waive failing tests blocking pre-merge (#9189)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-16 01:06:03 -08:00
JadoTu
3cde84581d
[None][fix] Make the sliced nvfp4 output contiguous (#9123)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-15 20:00:54 +08:00
Thor Johnsen
64cd91ae0a
[None][infra] Add trt-llm-kv-cache-manager-devs as code owner for appropriate files (#9182)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-11-15 16:46:14 +08:00