Commit Graph

4656 Commits

Author SHA1 Message Date
Pengbo Wang
683515b1bd
[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window (#10335)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-15 15:47:00 +08:00
Perkz Zheng
71ccc07d2b
[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-15 02:24:25 -05:00
Ludwig Schneider
e12a7119cf
[https://nvbugs/5741392][fix] [chore] Remove test exemptions from waivers tile (#10517)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-14 22:07:52 -08:00
Yiqing Yan
f4ace99218
[None][chore] Bump version to 1.3.0rc0 (#10681)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-15 13:55:44 +08:00
ruodil
22240e43eb
[None][test] store per user output and per gpu output metric in csv file (#10658)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-15 00:51:08 -05:00
Emma Qiao
7b3b6f1161
[None][infra] Waive failed tests on main 01/15 (#10683)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-15 13:40:37 +08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Lucas Liebenwein
62050b2381
[None][infra] separate AutoDeploy tests into own stages (#10634)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-14 23:05:26 -05:00
Void
f7de285a82
[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-14 22:15:29 -05:00
TensorRT LLM
482b7b8837 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-15 03:10:09 +00:00
Lucas Liebenwein
15b43e8a14
[https://nvbugs/5777041][fix] fix AutoDeploy ep sharding test (#10460)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-14 21:53:56 -05:00
Dom Brown
94c7b69048
[https://nvbugs/5630196] [fix] Prevent flaky failures in C++ test_e2e.py by using local cached datasets for benchmarking (#10638)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2026-01-14 21:39:55 -05:00
Wanli Jiang
73d1840c12
[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-15 10:07:02 +08:00
dominicshanshan
0f2d61b8c6
[https://nvbugs/5766952][fix] Fix AIPerf issue. (#10666)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-15 09:54:34 +08:00
bhsueh_NV
5f9fc50233
[https://nvbugs/5800725][infra] Update waives.txt (#10625) 2026-01-15 09:08:07 +08:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905)
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
TensorRT LLM
968db53194 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-14 22:18:53 +00:00
Tzu-Ling Kan
c99faaed06
[#9760][fix] Use RequestError for validation errors to prevent engine shutdown (#9761)
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
2026-01-14 10:22:36 -05:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
Emma Qiao
35c24424f6
[None][infra] Waive failed cases in post-merge on 01/14 (#10668)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-14 21:39:32 +08:00
HuiGao-NV
b10704428d
[https://nvbugs/5787566][fix] Only keep a limited number of performance statistic data (#10569)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-14 07:53:01 -05:00
Bo Li
582dec5bb5
[https://nvbugs/5774869][infra] Use 2 GPUs to test skip softmax attention on H100. (#10420)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-14 07:03:01 -05:00
shuyixiong
babd5ecacc
[https://nvbugs/5760740][fix] Enable ray tests (#10272)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2026-01-14 19:25:46 +08:00
Kyungmin Lee
25148d3fee
[None][feat] Support new Transformers RoPE configuration format (#10636)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-14 19:41:27 +09:00
xxi
e9817461ba
[None][chore] improve the readability of log for cutlass can only sup… (#10630)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:33:45 -05:00
xxi
d8862505b9
[None][chore] enable EPLB for DEEPGEMM (#10617)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:28:08 -05:00
xinhe-nv
272688c663
[None][fix] fix L0 issues (#10670)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-14 18:09:40 +08:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support (#10532)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
Bo Li
487287a412
[None][chore] Update test name MNNVL->NVLinkTwoSided. (#9672)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-14 04:29:57 -05:00
Zhenhuan Chen
287f6c2e0f
[None][test] add log_samples and output_path for trtllm_eval (#10629)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-14 16:01:38 +08:00
QI JUN
c4da4fd462
[https://nvbugs/5637220][ci] unwaive TestQwen3_235B_A22B::test_nvfp4[latency_moe_trtllm_attention_dp] (#9870)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2026-01-14 15:41:14 +08:00
Yukun He
15281de799
[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. (#10503)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-14 14:26:18 +08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
xxi
f841b43cde
[None][chore] waive the CI failure (#10655)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 13:59:15 +08:00
JennyLiu
92ae490410
[None][test] Spark - Change testlist name and perf yml format (#10626)
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-13 23:07:11 -05:00
xinhe-nv
07d9390e9b
[None][test] add test into qa test list (#10627)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-13 22:43:00 -05:00
tburt-nv
b65c515314
[None][chore] update allowlist 2026-01-13 (#10645)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-13 22:23:03 -05:00
TensorRT LLM
dd22324675 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-14 03:07:57 +00:00
xinhe-nv
7305c61fc9
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10589)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-13 22:00:20 -05:00
Leslie Fang
795e690bca
[https://nvbugs/5753788][chore] Padding empty chunk for configurable moe (#10451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-14 10:42:17 +08:00
Yuxian Qiu
d3f4fbb742
[None][fix] Avoid write-write race for async pp send. (#10488)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:39:36 +08:00
Yuxian Qiu
2acd03030a
[https://nvbugs/5781589][fix] Implement pp skip forward for all spec workers. (#10578)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:36:35 +08:00
Leslie Fang
bc119f5644
[None][chore] Add test configurable moe module (#10575)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-14 07:25:57 +08:00
Balaram Buddharaju
ccdfa43a6e
[https://nvbugs/5791900][fix] Fix HelixCpMnnvlMemory init with PP (#10533)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-13 15:48:42 -05:00
Frida Hou
bf16fbd86c
[#9283][feat] AutoDeploy: separate rms pattern detection from fusion (#9969)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-13 14:57:27 -05:00
Neta Zmora
7b7f1e2ba1
[None][feat] AutoDeploy: refactor memory usage logging (#8505)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-01-13 21:03:09 +02:00
dongfengy
6ee8dbfe0b
[https://nvbugs/5772396][fix] WAR: Disable TinyGEMM PDL due to accuracy issues (#10619)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
Yiteng Niu
7a47e29dcb
[None][infra] support overriding nspect version (#10402)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2026-01-13 23:39:45 +08:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00