Commit Graph

4945 Commits

Author SHA1 Message Date
Venky
897eb0df2b
[None][doc] Fix GLM4-MoE Eagle support documentation (#11198)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-02 13:36:09 -08:00
gramnarayan
585fbb2734
[#10826][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation (#11073)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-02-02 09:51:10 -08:00
Izzy Putterman
3ef8a4639b
[None][feat] Nemotron H: Eagle3 support (#11131)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-02-02 10:26:25 -05:00
Yanchao Lu
cd7762a2fa
[None][test] Fix an invalid test name (#11195)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-02 23:25:51 +08:00
Rundong Li
f1b85fea4c
[None][feat] Integrate cuda.tile RMS norm kernels (#9725)
Signed-off-by: Rundong (David) Li <davidli@nvidia.com>
Co-authored-by: Jinman Xie <jinmanx@nvidia.com>
Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com>
Co-authored-by: Qiqi Xiao <qiqix@nvidia.com>
Co-authored-by: Biao Wang <biaow@nvidia.com>
Co-authored-by: Thomas Schmid <thschmid@nvidia.com>
2026-02-02 19:44:27 +08:00
Mike Iovine
13b0ab9c0e [None][fix] Fix MTP 1-model sampler (#10369)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Mike Iovine
d9aef94431 [https://nvbugs/5814914][fix] Fix llama sm120 spec dec (#10765)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Ivy Zhang
fa5c3ead05 [None][test] Update test list (#10883)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yukun He
de465efc5f [https://nvbugs/5814309][fix] Use NCCL as fallback to avoid crash due to insufficient memory (#10928)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Zheyu Fu
d31482686c [https://nvbugs/5680911][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Enwei Zhu
7e5e5b90b9 [https://nvbugs/5748600][ci] Update guided decoding waive list (#10904)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yuxian Qiu
dd0a5491ba [https://nvbugs/5701445][chore] unwaive tests. (#10913)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yuxian Qiu
40d6f23dad [https://nvbugs/5784543][chore] unwaive test. (#10906)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Lucas Liebenwein
68a18f7a3a [https://nvbugs/5814247][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729) (#10850)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Enwei Zhu
ccdd8461ac [None][fix] Always reset drafting states for GuidedDecoder (#10899)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Michal Guzek
fafc22e3d4 [https://nvbugs/5691730][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
William Zhang
bc2487bc2c [https://nvbugs/5826962][fix] Fix PD disaggregation for VLMs that use mrope (#10865)
* Why?

Commit a6a8898 enabled EPD disaggregation for VLMs that use mrope (e.g.
qwen). However, this broke PD disaggregation for these sames models.

* What?

This commit fixes this, and adds a unit test that guards against it.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Lizhi Zhou
4d282bd7c1 [https://nvbugs/5821433][fix] fix test_auto_scaling for 2 GPUs (#10866)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Zhenhuan Chen
6c2ecad2fe [https://nvbugs/5769425][fix] add syncthreads for tinygemm to resolve intermittent accuracy problem (#10873)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
HuiGao-NV
8fd22ac72d [https://nvbugs/5740377][fix] Prevent out-of-bounds read (#10868)
Signed-off-by: Hui Gao <huig@nvidia.com>
Co-authored-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
JunyiXu-nv
2a5b8800e1 [https://nvbugs/5754977][fix] Use free port for serve test (#10878)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yi Zhang
0306c0f12c
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-02 14:29:02 +08:00
Emma Qiao
d3df3f6feb
[None][infra] Waive failed cases and disable a stage on 02/02 (#11177)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-02 13:28:53 +08:00
Kaiyu Xie
9909dca6fa
[None] [feat] Add PDL support for moeAlltoAllKernels (#10591)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-02-02 13:23:37 +08:00
Jin Li
77afcbddae
[https://nvbugs/5823284][fix] Unwaive no repro hang issue (#11138)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-02-01 23:02:27 -05:00
TensorRT LLM
3800abe26e [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-02 03:15:15 +00:00
Liao Lanyu
fef0e4b17d
[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-02 10:36:08 +08:00
Lizhi Zhou
b00e8338ec
[https://nvbugs/5834212][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID (#11095)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-02 09:54:33 +08:00
Dmitry Barsukoff
ea49afdf0b
[None][fix] AttributeError with return_perf_metrics on tensorrt backend (#10662)
Signed-off-by: Dmitry Barsukoff <riZZZhik@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2026-02-02 08:41:15 +08:00
Emma Qiao
1c8f8bed00
[None][infra] Waive failed cases for main on 1/30 (#11142)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-01 22:38:24 +08:00
TensorRT LLM
0350922c5f [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-01 03:17:07 +00:00
Yanchao Lu
2e757e8151
[None][ci] Waive a flaky test on A10 (#11163)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-01 00:07:23 +08:00
shuyixiong
278ced972b
[TRTLLM-9771][feat] Allow overriding quantization configs (#11062)
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-31 10:48:51 -05:00
bhsueh_NV
d1e4527c06
[https://nvbugs/5804683][infra] unwaive Mistral Large3 test (#10680)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-31 17:50:34 +08:00
Frida Hou
7910d4d2a9
[#8242][feat] Add int4 GPTQ support for AutoDeploy (#8248)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-30 23:07:24 -08:00
Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Balaram Buddharaju
531f85dc9b
[None][feat] Perfect routing for Deepseek models (#11127)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-30 23:46:35 -05:00
TensorRT LLM
baf9f7b4dc [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-31 03:36:14 +00:00
Venky
492ed27cdf
[None][doc] Add Glm4MoeForCausalLM to model support matrix (#11156)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-31 10:20:53 +08:00
Matt Lefebvre
97ab014bdb
[TRTINFRA-7548][infra] Update GB200 test configs to use frontend SLURM platforms (#11085)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2026-01-30 14:07:47 -08:00
Karthik
5a97374f3c
[#9525][feat] add L2 norm pattern matcher and fusion transform (#10767)
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-30 16:05:53 -05:00
nvyocox
4af47208d8
[None][feat] Export ONNX for DriveOS LLM (#10117)
Signed-off-by: yocox <yocox@nvidia.com>
2026-01-30 15:43:11 -05:00
yuanjingx87
f42a6cbae0
[None][infra] Add source code pulse scan to PLC nightly pipeline (#10961)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-30 11:06:48 -08:00
dominicshanshan
5d7411e131
[https://nvbugs/5853997][chore] Waive test (#11132)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-30 23:39:27 +08:00
Yechan Kim
a669a163ff
[None][doc] Update Qwen2/3-VL's model on supported_models.md (#10797)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-30 19:40:23 +09:00
Yao Yao
53cb762ee5
[None][feat] New KVCacheManagerV2 APIs for Transceiver (#11003)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-01-30 18:09:53 +08:00
Enwei Zhu
5ff244ce54
[https://nvbugs/5837281][fix] Fix trtllm-serve guided decoding test (#11101)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-30 16:59:55 +08:00
Tailing Yuan
9959a5c78e
[None][fix] Remove -ccache from build_wheel.py args (#11064)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-30 09:43:54 +01:00
Liao Lanyu
f2dd0ee128
[None][chore] Correct sorting order for attention DP scheduling to prioritize non-relaxed requests (#11106)
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
2026-01-30 16:06:48 +08:00
Yibin Li
322471cdd7
[https://nvbugs/5825514][fix] Add null pointer check to parseNpyHeader (#10944)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
This PR addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
2026-01-30 03:01:33 -05:00