Enwei Zhu
|
7c4777a571
|
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-11-18 17:40:12 -08:00 |
|
Lizhi Zhou
|
c789000a62
|
[https://nvbugs/5649010][fix] increase status-checking interval to avoid instability (#9203)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-11-19 08:55:42 +08:00 |
|
Bo Deng
|
34f845bf69
|
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-11-18 14:46:20 -08:00 |
|
Ajinkya Rasane
|
8d7cda2318
|
[None][chore] Update the Flux autodeploy example (#8434)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-11-18 14:16:04 -08:00 |
|
Ziyi Xiong
|
7c4344b92e
|
[https://nvbugs/5590408][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-11-18 15:41:56 -05:00 |
|
Eran Geva
|
3ac11a6180
|
[#9152][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode (#9197)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-11-18 22:15:29 +02:00 |
|
Chenghao Zhang
|
f0b68e4c66
|
[None][feat] AutoDeploy: Perf improvement for small batch size (#9163)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-11-18 12:11:12 -08:00 |
|
Nikita Korobov
|
fe569f0594
|
[None][feat] bias for FP4 TRT-LLM Gen MoE (#9220)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-11-18 09:59:47 -08:00 |
|
mpikulski
|
04fb481da3
|
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-11-18 09:41:59 -08:00 |
|
Gal Hubara-Agam
|
36d3d8f608
|
[None][chore] Print device info in trtllm-bench report (#8584)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2025-11-18 09:00:10 -08:00 |
|
Kaiyu Xie
|
d076aa44d3
|
[None] [tests] Unwaive wide ep related tests (#9204)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-11-18 08:54:46 -08:00 |
|
Zheyu Fu
|
c4e02d7f04
|
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-11-18 11:13:39 -05:00 |
|
Ivy Zhang
|
160b361588
|
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 05:55:00 -08:00 |
|
Robin Kobus
|
9913dc25ae
|
[None][refactor] decoding inputs, part 2 (#5799)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-11-18 14:38:51 +01:00 |
|
Ivy Zhang
|
ca41a71f92
|
[TRTLLM-8948][test] Add long bench case (#9165)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 04:41:48 -08:00 |
|
Chang Liu
|
8e001dd195
|
[None][fix] DeepSeek V3.2 indexer RoPE fix (#9232)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-11-18 20:35:27 +08:00 |
|
Lizhi Zhou
|
07343bb11c
|
[None][chore] fix a deepseekv3 error when debug mode is on (#9217)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-11-18 01:14:32 -08:00 |
|
ruodil
|
82480346aa
|
[https://nvbugs/5652552][fix] add printing for llm args (#9205)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
|
2025-11-17 23:58:36 -08:00 |
|
Zero Zeng
|
43896af1b1
|
[None][chore] benchmark refactor (#9207)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
|
2025-11-17 23:29:28 -08:00 |
|
Stanley Sun
|
96cfdd8a72
|
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
|
2025-11-17 22:02:24 -08:00 |
|
Gal Hubara-Agam
|
5e5300898b
|
[#8732][feat] Add ReLU2 to TRTLLM Cutlass MoE BF16 kernels (#9191)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2025-11-17 20:30:00 -08:00 |
|
TensorRT LLM
|
fd9916424f
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-18 03:23:16 +00:00 |
|
Tri Dao
|
fc088e642c
|
[None][feat] Support Glm4MoeForCausalLM (#8256)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-11-18 09:43:21 +08:00 |
|
QI JUN
|
c3376fa114
|
[None][ci] split speculative test case into several small cases (#9209)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-17 17:02:25 -08:00 |
|
Lucas Liebenwein
|
6d0a8edbbb
|
[None][chore] local imports for AutoDeploy in serve and bench (#9199)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-11-18 08:14:32 +08:00 |
|
zackyoray
|
e3c9a97075
|
[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection (#9075)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
|
2025-11-17 15:39:55 -08:00 |
|
TensorRT LLM
|
2d6289b4b4
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-17 22:26:06 +00:00 |
|
yuanjingx87
|
ec36a3af7e
|
[None][infra] Fix lock file generation script (#9180)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-17 11:53:56 -08:00 |
|
Matt Lefebvre
|
470d777744
|
[TRTINFRA-7280][infra] Support enroot/pyxis clusters in multi-node SLURM and enable oci-hsg GB200 in post-merge (#9117)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
|
2025-11-17 10:59:30 -08:00 |
|
Robin Kobus
|
df41f220a2
|
[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-11-17 18:07:13 +01:00 |
|
Mike Iovine
|
6151a4c9d6
|
[None][feat] Add simple optimizations for MTP 2-model (#9176)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-17 10:05:39 -05:00 |
|
Yiqing Yan
|
24f5cd7493
|
[TRTLLM-8000][infra] Catch error in merge waive list stage (#7289)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-17 13:28:50 +08:00 |
|
Kaiyu Xie
|
04be5a704e
|
[None] [fix] Fix missing ActivationType issue (#9171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
|
2025-11-17 10:43:25 +08:00 |
|
Anthony Chang
|
86cfb3ea7e
|
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-11-17 10:04:29 +08:00 |
|
Jinyang Yuan
|
6dc70aa0e5
|
[https://nvbugs/5613089][fix] Fix the rank to access all_rank_chunk_size_list when chunked MoE is used (#8723)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-11-17 10:01:08 +08:00 |
|
Emma Qiao
|
d16b1a84c5
|
[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-11-17 09:36:56 +08:00 |
|
sunnyqgg
|
7862b15a65
|
[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-11-17 09:01:53 +08:00 |
|
Guoming Zhang
|
e0f69657c7
|
[None][fix] Update the attention layers counting for Qwen3-next. (#9072)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-11-16 11:52:56 -08:00 |
|
Emma Qiao
|
2854f0cf3d
|
[None][infra] Waive failed tests for main branch 11/15 (#9187)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-11-16 01:48:25 -08:00 |
|
brb-nv
|
63237494db
|
[None][chore] Waive failing tests blocking pre-merge (#9189)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-11-16 01:06:03 -08:00 |
|
JadoTu
|
3cde84581d
|
[None][fix] Make the sliced nvfp4 output contiguous (#9123)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
|
2025-11-15 20:00:54 +08:00 |
|
Thor Johnsen
|
64cd91ae0a
|
[None][infra] Add trt-llm-kv-cache-manager-devs as code owner for appropriate files (#9182)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
2025-11-15 16:46:14 +08:00 |
|
Erin
|
fe69243157
|
[None][chore] Add placement test for ray executor (#9122)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-11-14 23:10:59 -08:00 |
|
Zhanrui Sun
|
bdcf837784
|
[TRTLLM-9079][infra] upgrade tritonserver DLFW 25.10 (#8929)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-11-14 20:22:10 -08:00 |
|
yuanjingx87
|
83122bfd64
|
[None][infra] Update allowlist 2025.11.14 (#9183)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-14 16:29:26 -08:00 |
|
yuanjingx87
|
73b8783903
|
[None][infra] Fix medata.json generated by lock file genreation pipeline (#9179)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-14 12:28:20 -08:00 |
|
TensorRT LLM
|
cbabdae57d
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-14 18:54:51 +00:00 |
|
yuanjingx87
|
05b5336ab6
|
[None][infra] Lock generation pipeline update (#9084)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-14 10:12:25 -08:00 |
|
Chang Liu
|
bed4e95e9f
|
[https://nvbugs/5629887][fix] Add missing device count guard for DSv32 multiGPU tests (#9159)
|
2025-11-14 07:52:23 -08:00 |
|
xinhe-nv
|
49b7e6301a
|
[None][chore] Add failed cases into waives.txt (#9156)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-11-14 06:28:22 -08:00 |
|