Commit Graph

4234 Commits

Author SHA1 Message Date
Lucas Liebenwein
76ec820465
[#7532][feat] AutoDeploy: gather logits before lm head (#9962)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-17 19:50:13 -08:00
TensorRT LLM
cfe53e7425 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-18 03:23:35 +00:00
xinhe-nv
4a98f190a8
[None][chore] Add failed cases into waives.txt (#10025)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-17 19:13:52 -08:00
xinhe-nv
c1cfb61b1b
[TRTLLM-9381][feat] Add kimi k2 fp4 tests (#9906)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-17 18:15:27 -08:00
TensorRT LLM
50c2b82f24 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-17 23:45:35 +00:00
tburt-nv
27064f95c7
[None][chore] Clarify copyright header guidance (#9882)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-18 06:38:10 +08:00
tburt-nv
5da7879b38
[None][fix] Revert GHA upgrade for blossom-ci workflow (#10095)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-17 15:57:04 -05:00
Chenghao Zhang
22c6e8a424
[None][fix] Autodeploy: fix some legacy flashinfer attention test errors (#9928)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-17 12:27:22 -08:00
Salman Chishti
cb5cd4376e
[None][chore] Upgrade GitHub Actions for Node 24 compatibility (#10045)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2025-12-17 09:44:09 -08:00
Yuan Tong
f7e245668b
[TRTLLM-9680][perf] Optimize TRTLLMSampler log_probs performance (Core fix has been merged via #9353) (#9655)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-12-17 17:56:01 +08:00
Yukun He
00c0564334
[None][chore] Remove unnecessary warning log for tuning. (#10077)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 01:51:17 -08:00
Yukun He
18b335d584
[TRTLLM-9989][fix] Disable tvm_ffi for CuteDSL nvFP4 dense GEMM. (#10040)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 00:41:26 -08:00
Yukun He
2fd1a23e4c
[TRTLLM-9998][fix] Change trtllm-gen MoE distributed tuning strategy back to INDEPENDENT (#10036)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 00:35:22 -08:00
yufeiwu-nv
5d71f662c3
[https://nvbugs/5698434][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-17 13:37:25 +08:00
Void
47404196fa
[None][fix] Enabled simultaneous support for low-precision combine and MTP. (#9091)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-17 13:37:08 +08:00
Emma Qiao
0dbf3948cc
[None][infra] Waive failed tests due to llm model files (#10068)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 20:12:57 -08:00
Kaiyu Xie
02fd13448b
[None] [feat] Enhancements to slurm scripts (#10031)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-16 19:31:27 -08:00
JunyiXu-nv
6649c3743c
[https://nvbugs/5635153][chore] Remove responses tests from waive list (#10026)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-17 11:22:02 +08:00
shuyixiong
26fb063076
[https://nvbugs/5741060][fix] Fix pg op test (#9989)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-17 09:44:25 +08:00
Aurelien Chartier
7175d89b48
[None][fix] Fix iteration stats for spec-dec (#9855)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-16 14:11:38 -08:00
QI JUN
dba9036072 [None][doc] remove nano-vl-v2 model support in release notes (#9887)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
QI JUN
3daca4fea3 [https://nvbugs/5729847][doc] fix broken links to modelopt (#9868)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
QI JUN
e6ab864066 [None][doc] Update release notes (#9739)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Laikh Tewari <laikhtewari1@gmail.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Zac Patel
1ffa2c8937 [IB-1920][doc] Update Perf_Overview.md with Benchmarking Results for Release 1.1 (#9723)
Signed-off-by: Zachary Patel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
xiweny
2756a0da60 [TRTLLM-4629][doc] Add B300 & GB300 in documents (#9663)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
ruodil
07f307d131 [https://nvbugs/5652552][fix] cherry-pick add printing for llm args (#9206)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Iman Tabrizian
1fc8bd3cd8 [TRTLLM-9082][doc] Address Dynamo Example feedback (#9619)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Kaiyu Xie
e41b060fe6 [TRTLLM-9090] [doc] Update online benchmarking docs (#9611)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Lizhi Zhou
bd13957e70
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-16 05:16:32 -08:00
Enwei Zhu
609d1d0383
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM (#10008)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 04:06:49 -08:00
Enwei Zhu
6a238ca8ad
[None][doc] Update CONTRIBUTING.md (#10023)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 18:58:43 +08:00
Emma Qiao
12727ebd7f
[None][infra] Waive failed test for main branch on 12/16 (#10029)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 02:54:32 -08:00
Perkz Zheng
064b67e40c
[https://nvbugs/5727952][fix] a pdl bug in trtllm-gen fmha kernels (#9913)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
yuanjingx87
0a4c59136a
[None][infra] Fixing credential loading in lockfile generation pipeline (#10020)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-16 15:38:29 +08:00
William Zhang
28b02b4f5a
[None][docs] Add README for Nemotron Nano v3 (#10017)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 22:17:24 -08:00
Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts (#9992)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
Wanli Jiang
8af51211c1
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass (#9358)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-16 12:41:17 +08:00
Eran Geva
ce7a42f4cf
[https://nvbugs/5731717][fix] fixed flashinfer build race condition during test (#9983)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-15 20:30:24 -08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
QI JUN
4ce35eacf1
[TRTLLM-9794][ci] move more test cases to gb200 (#9994)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-15 19:50:41 -08:00
xinhe-nv
cdf56c278f
[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. (#9979)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-15 18:59:13 -08:00
Zhanrui Sun
b757ea73ba
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA (#9652)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-15 18:58:59 -08:00
Michal Guzek
e6187d8109
[https://nvbugs/5708810][fix] Fix TRTLLMSampler (#9710)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-15 23:26:52 +01:00
Patrice Castonguay
9ba14263db
[https://nvbugs/5673559][fix] Unwaiving disagg test for nvbug 5673559 (#9957)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-15 12:32:15 -05:00
Emma Qiao
d5d15c06df
[None][infra] Waive failed tests for main branch on 12/15 (#10001)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-16 01:29:43 +08:00
Faraz
0c31502fbc
[None][feat] disable fused gemm for sm121 (#9916)
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
2025-12-15 12:07:06 -05:00
Kaiyu Xie
44b0f8c3ed
[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002) 2025-12-15 08:52:52 -08:00
zackyoray
63e7a2fa70
[None][infra] Update ucx to 1.20.x (#9977)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-16 00:31:48 +08:00
arekay-nv
4f75a31a45
[https://nvbugs/5540979][fix] Potential fix for 5540979 (#9716)
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
2025-12-15 10:49:31 -05:00