Commit Graph

4918 Commits

Author SHA1 Message Date
Lizhi Zhou
b00e8338ec
[https://nvbugs/5834212][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID (#11095)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-02 09:54:33 +08:00
Dmitry Barsukoff
ea49afdf0b
[None][fix] AttributeError with return_perf_metrics on tensorrt backend (#10662)
Signed-off-by: Dmitry Barsukoff <riZZZhik@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2026-02-02 08:41:15 +08:00
Emma Qiao
1c8f8bed00
[None][infra] Waive failed cases for main on 1/30 (#11142)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-01 22:38:24 +08:00
TensorRT LLM
0350922c5f [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-01 03:17:07 +00:00
Yanchao Lu
2e757e8151
[None][ci] Waive a flaky test on A10 (#11163)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-01 00:07:23 +08:00
shuyixiong
278ced972b
[TRTLLM-9771][feat] Allow overriding quantization configs (#11062)
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-31 10:48:51 -05:00
bhsueh_NV
d1e4527c06
[https://nvbugs/5804683][infra] unwaive Mistral Large3 test (#10680)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-31 17:50:34 +08:00
Frida Hou
7910d4d2a9
[#8242][feat] Add int4 GPTQ support for AutoDeploy (#8248)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-30 23:07:24 -08:00
Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Balaram Buddharaju
531f85dc9b
[None][feat] Perfect routing for Deepseek models (#11127)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-30 23:46:35 -05:00
TensorRT LLM
baf9f7b4dc [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-31 03:36:14 +00:00
Venky
492ed27cdf
[None][doc] Add Glm4MoeForCausalLM to model support matrix (#11156)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-31 10:20:53 +08:00
Matt Lefebvre
97ab014bdb
[TRTINFRA-7548][infra] Update GB200 test configs to use frontend SLURM platforms (#11085)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2026-01-30 14:07:47 -08:00
Karthik
5a97374f3c
[#9525][feat] add L2 norm pattern matcher and fusion transform (#10767)
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-30 16:05:53 -05:00
nvyocox
4af47208d8
[None][feat] Export ONNX for DriveOS LLM (#10117)
Signed-off-by: yocox <yocox@nvidia.com>
2026-01-30 15:43:11 -05:00
yuanjingx87
f42a6cbae0
[None][infra] Add source code pulse scan to PLC nightly pipeline (#10961)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-30 11:06:48 -08:00
dominicshanshan
5d7411e131
[https://nvbugs/5853997][chore] Waive test (#11132)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-30 23:39:27 +08:00
Yechan Kim
a669a163ff
[None][doc] Update Qwen2/3-VL's model on supported_models.md (#10797)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-30 19:40:23 +09:00
Yao Yao
53cb762ee5
[None][feat] New KVCacheManagerV2 APIs for Transceiver (#11003)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-01-30 18:09:53 +08:00
Enwei Zhu
5ff244ce54
[https://nvbugs/5837281][fix] Fix trtllm-serve guided decoding test (#11101)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-30 16:59:55 +08:00
Tailing Yuan
9959a5c78e
[None][fix] Remove -ccache from build_wheel.py args (#11064)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-30 09:43:54 +01:00
Liao Lanyu
f2dd0ee128
[None][chore] Correct sorting order for attention DP scheduling to prioritize non-relaxed requests (#11106)
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
2026-01-30 16:06:48 +08:00
Yibin Li
322471cdd7
[https://nvbugs/5825514][fix] Add null pointer check to parseNpyHeader (#10944)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
This PR addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
2026-01-30 03:01:33 -05:00
dongfengy
4f0c1b2489
[TRTLLM-10733][feat] Make TRTLLM MOE the default one for GPTOSS on Blackwell (#11074)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-29 23:59:19 -08:00
Jin Li
ef268e2062
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-30 01:49:17 -05:00
JennyLiu
6506d63466
[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases (#11096)
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-30 00:38:19 -05:00
TensorRT LLM
29a203aedb [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-30 03:22:39 +00:00
Yueh-Ting (eop) Chen
e1e3bb8592
[https://nvbugs/5775544][fix] Unwaive test (#11023)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-01-30 09:39:08 +08:00
Necofish
144b61715f
[None][fix] Add missing absolute pe in Qwen3-VL Vision Encoder (#11065)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
2026-01-30 09:59:36 +09:00
yuanjingx87
54ba056924
[None][infra] Remove invalid account for blossom CI (#11126)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-29 16:17:44 -08:00
Chang Su
dbad94715b
[None][feat] Add gRPC server for high-performance external router integration (#11037)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-01-30 07:48:27 +08:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Mike Iovine
0ad87895f5
[https://nvbugs/5836592][fix] Fix qwen3 eagle test (#11030)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-29 14:49:08 -08:00
Lucas Liebenwein
a4880ffdbb
[None][fix] AutoDeploy: remove mem check for a log unit test (#11120)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-29 15:41:51 -05:00
Tailing Yuan
4345636b04
[None][chore] Clean up layer-wise benchmarks code (#11092)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:29:37 -05:00
Harris Nover
ab7dd34bbe
[None][chore] Consolidate duplicate kv cache reuse variables. (#10935)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-01-29 11:03:27 -08:00
Stefan Niebler
7d31532850
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2026-01-29 11:06:09 -05:00
WeiHaocheng
80dd6e70c6
[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2026-01-29 20:43:34 +08:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Zhanrui Sun
21d475a391
[None][infra] Waived flaky tests (#11091)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-29 02:18:30 -05:00
Yi Sun
f6dab8388d
[https://nvbugs/5813452][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2026-01-29 14:38:11 +08:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Enwei Zhu
34a730aaf7
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-29 12:11:07 +08:00
Anish Shanbhag
24ac86c485
[https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
TensorRT LLM
e20f9a9c72 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-29 03:20:49 +00:00
Yiqing Yan
6fcbf15fb8
[None][fix] No need to remove the original waive list (#11060)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-29 11:10:38 +08:00
Frida Hou
f03908cf9e
[None][fix] fix Qwen2/3 export for AutoDeploy (#11007)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-28 16:53:21 -08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks (#11042)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Bala Marimuthu
393c3d259e
[#10245][feat] AutoDeploy: Add Minimax M2 support (#10525)
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-01-28 17:22:32 -05:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00