Commit Graph

4894 Commits

Author SHA1 Message Date
Jin Li
ef268e2062
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-30 01:49:17 -05:00
JennyLiu
6506d63466
[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases (#11096)
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-30 00:38:19 -05:00
TensorRT LLM
29a203aedb [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-30 03:22:39 +00:00
Yueh-Ting (eop) Chen
e1e3bb8592
[https://nvbugs/5775544][fix] Unwaive test (#11023)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-01-30 09:39:08 +08:00
Necofish
144b61715f
[None][fix] Add missing absolute pe in Qwen3-VL Vision Encoder (#11065)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
2026-01-30 09:59:36 +09:00
yuanjingx87
54ba056924
[None][infra] Remove invalid account for blossom CI (#11126)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-29 16:17:44 -08:00
Chang Su
dbad94715b
[None][feat] Add gRPC server for high-performance external router integration (#11037)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-01-30 07:48:27 +08:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Mike Iovine
0ad87895f5
[https://nvbugs/5836592][fix] Fix qwen3 eagle test (#11030)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-29 14:49:08 -08:00
Lucas Liebenwein
a4880ffdbb
[None][fix] AutoDeploy: remove mem check for a log unit test (#11120)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-29 15:41:51 -05:00
Tailing Yuan
4345636b04
[None][chore] Clean up layer-wise benchmarks code (#11092)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:29:37 -05:00
Harris Nover
ab7dd34bbe
[None][chore] Consolidate duplicate kv cache reuse variables. (#10935)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-01-29 11:03:27 -08:00
Stefan Niebler
7d31532850
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2026-01-29 11:06:09 -05:00
WeiHaocheng
80dd6e70c6
[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2026-01-29 20:43:34 +08:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Zhanrui Sun
21d475a391
[None][infra] Waived flaky tests (#11091)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-29 02:18:30 -05:00
Yi Sun
f6dab8388d
[https://nvbugs/5813452][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2026-01-29 14:38:11 +08:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Enwei Zhu
34a730aaf7
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-29 12:11:07 +08:00
Anish Shanbhag
24ac86c485
[https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
TensorRT LLM
e20f9a9c72 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-29 03:20:49 +00:00
Yiqing Yan
6fcbf15fb8
[None][fix] No need to remove the original waive list (#11060)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-29 11:10:38 +08:00
Frida Hou
f03908cf9e
[None][fix] fix Qwen2/3 export for AutoDeploy (#11007)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-28 16:53:21 -08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks (#11042)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Bala Marimuthu
393c3d259e
[#10245][feat] AutoDeploy: Add Minimax M2 support (#10525)
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-01-28 17:22:32 -05:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00
Emma Qiao
0ffa77af51
[None][infra] Waive failed cases for main on 1/28 (#11053)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-28 06:11:06 -05:00
yingguo-trt
e70a55bd94
[None][feat] support multi_acc and Lyris GB200 test (#11024)
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-28 06:01:48 -05:00
Linda
29647d9446
[None][chore] Removing cpp/tensorrt_llm/pybind (#11026)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-28 11:25:11 +01:00
Grzegorz Kwasniewski
38bcee189c
[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-28 10:34:10 +01:00
yuanjingx87
3e17ee4e38
[None][infra] Update CI allowList (#11040)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-28 00:54:37 -08:00
Pengbo Wang
d008494232
[https://nvbugs/5779536][fix] Cherry-pick #10902: Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902) (#11000)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-28 14:18:53 +08:00
xinhe-nv
dc5eda546b
[None][fix] unwaive tests (#11047)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-27 23:49:07 -05:00
TensorRT LLM
a7748ceb57 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-28 03:36:53 +00:00
dongfengy
1c2e415b3a
[https://nvbugs/5756804][fix] Re-enable passing test (#10986)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2026-01-28 11:23:43 +08:00
Yuan Tong
30348b2753
[None][fix] Proper conditional compilation of sm10x cubins (#10839)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-01-28 10:17:51 +08:00
Matt Lefebvre
c26a8f764c
[TRTINFRA-7379][infra] Change SLURM config access to use resolvePlatform (#11006)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2026-01-27 12:33:16 -08:00
NVShreyas
6c1862fb33
[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver (#10957)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2026-01-27 12:23:02 -08:00
Evgueni Petrov
f25a2c53bb
[#10877][fix] restore ipv6 support in serve.py (#10929)
Signed-off-by: Evgueni Petrov <evgueni.s.petrov@gmail.com>
2026-01-27 11:55:59 -08:00
Simeng Liu
bae2fac834
[https://nvbugs/5721661][chore] Unwaive fixed bug. (#11009)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-01-27 11:41:48 -08:00
Lucas Liebenwein
ff3a494f5c
[#10013][feat] AutoDeploy: native cache manager integration (#10635)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Gal Hubara-Agam
7f8c260601
[https://nvbugs/5843316][chore] waive overlap_scheduler test (#11025)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-27 09:07:52 -05:00
xinhe-nv
552aa32aa2
[None][chore] Add failed cases into waives.txt (#10993)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-01-27 06:08:11 -05:00
Yukun He
b575184fca
[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-27 16:39:40 +08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
ZhichenJiang
fae4985797
[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987)
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
2026-01-27 16:15:32 +08:00
Bo Li
6b251cc7fa
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-27 15:55:07 +08:00
Lizhi Zhou
93ae8a14ab
[#10889][fix] fix pydantic deepcopy bug (#11004)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
xinhe-nv
069ad30bdb
[None][chore] Remove closed bugs (#10982)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-27 15:35:44 +08:00
Yiqing Yan
ea5d811aec
[None][chore] Bump version to 1.3.0rc2 (#11021)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-27 15:26:03 +08:00