gramnarayan
a9eb5afc9f
[ #9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding ( #9869 )
...
Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend.
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-24 23:30:42 -05:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests ( #9939 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Jhao-Ting Chen
92d90fa29a
[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS ( #10018 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-23 11:41:31 -06:00
chenfeiz0326
48c875f8ea
[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test ( #9990 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-23 16:02:38 +08:00
Harshini Komali
d691371eaf
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf ( #9310 )
...
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-23 13:25:55 +08:00
fredricz-20070104
621156ad44
[None][chore] Fix GB300 support issues ( #10196 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-23 10:42:41 +08:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Chuang Zhu
914dd39127
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test ( #9735 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-22 09:29:24 +08:00
Balaram Buddharaju
5266475014
[None][feat] Cudagraph updates for helix parallelism ( #10141 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 15:21:52 -05:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 ( #9971 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Balaram Buddharaju
dcd3f7b5ea
[ https://nvbugs/5744427 ][fix] Fix accuracy test OOM ( #10173 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 02:03:38 -05:00
Yuxian Qiu
3b3069b390
[ https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. ( #10121 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:42:07 +08:00
Gal Hubara-Agam
20b69a982a
[ #10056 ][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 ( #10131 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-19 13:28:42 -08:00
Chang Liu
5489d188a4
[None][fix] Revert the change and remove device count guard for DSv32 ( #9631 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 15:00:55 -05:00
Venky
dfa11d810e
[TRTC-102][docs] --extra_llm_api_options->--config in docs/examples/tests ( #10005 )
2025-12-19 13:48:43 -05:00
yufeiwu-nv
52cee573ad
[TRTLLM-8830][test] Overlap scheduler enhancement perf test: Add qwen3_0,8b and llama3.1 test cases ( #10114 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-19 17:01:52 +08:00
JunyiXu-nv
356ad4fe3a
[ https://nvbugs/5722653 ][fix] Address port conflict by assigning different port section in the same node. ( #10035 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-19 15:34:04 +08:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend ( #8309 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
Anish Shanbhag
91a9ae42d2
[TRTC-71][feat] Add regression testing for config database ( #9832 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-12-18 16:15:38 -08:00
Balaram Buddharaju
799a2ae311
[ https://nvbugs/5741331 ][fix] Fix helix accuracy test ( #10021 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-18 15:27:53 -08:00
Lizhi Zhou
f02782a6f2
[ https://nvbugs/5726066 ][fix] fix auto-scaling related failures ( #9845 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2025-12-18 16:37:48 -05:00
Yuxian Qiu
bec864a78c
[None][fix] avoid ID conversion for non enable_configurable_moe cases. ( #10003 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-18 13:29:52 +08:00
Wanli Jiang
601c29ca73
[ https://nvbugs/5721644 ][fix] Update tests for nemotron_h ( #9993 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-18 12:38:02 +08:00
xinhe-nv
c1cfb61b1b
[TRTLLM-9381][feat] Add kimi k2 fp4 tests ( #9906 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-17 18:15:27 -08:00
yufeiwu-nv
5d71f662c3
[ https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test ( #10041 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-17 13:37:25 +08:00
Aurelien Chartier
7175d89b48
[None][fix] Fix iteration stats for spec-dec ( #9855 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-16 14:11:38 -08:00
Lizhi Zhou
bd13957e70
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic ( #9726 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-16 05:16:32 -08:00
Enwei Zhu
609d1d0383
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM ( #10008 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 04:06:49 -08:00
Eran Geva
ce7a42f4cf
[ https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test ( #9983 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-15 20:30:24 -08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE ( #9689 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
Balaram Buddharaju
dfc8799352
[ https://nvbugs/5669114 ][fix] Switch to MMMU benchmark for Gemma3 27B ( #9966 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 21:23:59 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. ( #9524 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
xxi
f5696df285
[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm ( #9858 )
2025-12-15 10:47:15 +08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 ( #9885 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism ( #9720 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part ( #9820 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
tburt-nv
6147452158
[ https://nvbugs/4141427 ][chore] Add more details to LICENSE file ( #9881 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
ruodil
9b3e5e90ee
[None][test] fix a typo in model name in script ( #9867 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-12 17:35:55 +08:00
chenfeiz0326
61745f034a
[ https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test ( #9896 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-12 17:16:50 +08:00
Ivy Zhang
fded6c393d
[TRTLLM-9262][test] add groupgemm ada case for rcca ( #9833 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-12-12 13:23:33 +08:00
xinhe-nv
e8efeb765d
[TRTLLM-9717][fix] fix multi nodes tests cases ( #9736 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-12 10:14:23 +08:00
xxi
488d38f88d
[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS ( #9772 )
2025-12-12 00:22:13 +08:00
fredricz-20070104
341cb1a12c
[None][chore] Add GB300 support since it does not support segment ( #9731 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-10 18:36:55 -08:00
Patrice Castonguay
2c0293c612
[ https://nvbugs/5601682 ][fix] Unwaiving disagg test ( #9627 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-10 13:42:26 -05:00
cheshirekow
2f030312a8
[TRTLLM-9228][infra] Verify thirdparty C++ process ( #9367 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-12-10 21:01:19 +08:00
dhansen-nvidia
2d33ae94d5
[ https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… ( #8463 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-12-09 18:51:31 -05:00
QI JUN
252769c930
[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 ( #9817 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-08 21:51:30 -08:00