TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-12 14:03:48 +08:00

Author	SHA1	Message	Date
bhsueh_NV	e49c70f6df	[None][feat] Support Mistral Large3 LLM part (#9820 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-13 11:44:27 +08:00
Balaram Buddharaju	461446045e	[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 16:49:25 -08:00
tburt-nv	6147452158	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-12-13 08:35:31 +08:00
Chuang Zhu	4cc4cbe926	[https://nvbugs/5716787 ][fix] terminate nixl running when exiting (#9785 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-12 11:15:02 -05:00
Chuang Zhu	9c59c9f920	[https://nvbugs/5643787 ][fix] remove the war path for notify to itself (#9834 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-12 11:10:05 -05:00
JunyiXu-nv	2fec53dfa5	[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-12 23:32:39 +08:00
Yihan Wang	9df4dad3b6	[None][fix] Introduce inline namespace to avoid symbol collision (#9541 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 23:32:15 +08:00
Balaram Buddharaju	af315d8ef1	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:05 +08:00
Lucas Liebenwein	e767fc649a	[None][feat] AutoDeploy: prepare_metadata revisited (#9764 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-12 20:14:14 +08:00
ruodil	9b3e5e90ee	[None][test] fix a typo in model name in script (#9867 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-12 17:35:55 +08:00
chenfeiz0326	61745f034a	[https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-12 17:16:50 +08:00
kris1025	2fc94e5dd7	[None][chore] unwaive qwen3 accuracy test (#9895 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-12-12 16:30:09 +08:00
Yihan Wang	711016c799	[https://nvbugs/5736923 ][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test (#9942 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 15:15:13 +08:00
Ivy Zhang	fded6c393d	[TRTLLM-9262][test] add groupgemm ada case for rcca (#9833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-12-12 13:23:33 +08:00
dominicshanshan	093465ed29	[https://nvbugs/5599176 ][fix] Unwaive fixed test for Ray (#9861 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-12 11:24:05 +08:00
xinhe-nv	e8efeb765d	[TRTLLM-9717][fix] fix multi nodes tests cases (#9736 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-12 10:14:23 +08:00
Venky	fd1270b9ab	[TRTC-43] [feat] Add config db and docs (#9420 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-12-12 04:00:03 +08:00
Simeng Liu	24f92721f2	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-12 02:29:30 +08:00
Erin	89dabf5aa1	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 ) Signed-off-by: Liwei Ma <liweim@nvidia.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-11 09:33:25 -08:00
JadoTu	02edb19f43	[None] [feat] add eos_token_id in generation_config to sampling params (#9514 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-12 00:52:03 +08:00
xxi	488d38f88d	[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772 )	2025-12-12 00:22:13 +08:00
Yan Chunwei	04a39a4e2b	[None][chore] enable test_ipc.py (#9865 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-11 17:47:14 +08:00
Zongfei Jing	c76b428e2e	[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-12-11 16:21:32 +08:00
JunyiXu-nv	454e7e59e5	[https://nvbugs/5718004 ][fix] Add warmup for cancellation test (#9860 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-11 12:20:33 +08:00
Bo Deng	c1d53ee43d	[https://nvbugs/5582258 ][fix] unwaive (#9650 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-12-10 19:18:30 -08:00
fredricz-20070104	341cb1a12c	[None][chore] Add GB300 support since it does not support segment (#9731 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-10 18:36:55 -08:00
Patrice Castonguay	2c0293c612	[https://nvbugs/5601682 ][fix] Unwaiving disagg test (#9627 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-10 13:42:26 -05:00
cheshirekow	2f030312a8	[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367 ) Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-12-10 21:01:19 +08:00
Yukun He	072f236002	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 ) Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-10 20:41:04 +08:00
dominicshanshan	0e78a4b244	[https://nvbugs/5702791 ][fix] Unwaive fixed test (#9844 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-10 14:01:44 +08:00
QI JUN	2c46126a93	[TRTLLM-9794][ci] move some deepseek test cases to gb200 (#9841 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 19:54:51 -08:00
zhanghaotong	36c9e7cfe6	[None][chore] Add unittest for otlp tracing (#8716 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-09 18:34:08 -08:00
dhansen-nvidia	2d33ae94d5	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-12-09 18:51:31 -05:00
Patrice Castonguay	414448bb37	[https://nvbugs/5719561 ][chore] Unwaive tests for nvbug 5719561 (#9801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 18:21:50 -05:00
Patrice Castonguay	ff0ef19ee9	[https://nvbugs/5688388 ][chore] Unwaiving fixed disagg test (#9800 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 16:51:46 -05:00
Patrice Castonguay	7d7d05d8db	[None][chore] Adding flaky auto scaling test to waives (#9851 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 15:05:19 -05:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Dom Brown	3156f2e852	[https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 (#9788 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-12-09 13:37:55 +00:00
Emma Qiao	75bc386b65	[None][infra] Waive failed cases for main branch on 12/09 (#9839 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-09 19:39:29 +08:00
QI JUN	58c29957d9	[TRTLLM-9794][ci] move qwen3-next test cases to gb200 (#9827 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 01:58:25 -08:00
Stefan Niebler	d600b9f851	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-09 10:44:01 +01:00
Robin Kobus	76f49c903b	[None][fix] Additional model outputs for pipeline parallelism (#9794 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-09 10:41:22 +01:00
yufeiwu-nv	fbcf03040f	[None][test] Refactor qa/llm_perf_nim.yml test list (#9700 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-08 22:00:43 -08:00
QI JUN	252769c930	[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 (#9817 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-08 21:51:30 -08:00
Shi Xiaowei	b050804b63	[TRTLLM-6537][infra] extend multi-gpu tests related file list (#9614 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-09 12:54:53 +08:00
JunyiXu-nv	90890785eb	[https://nvbugs/5722653 ][fix] Fix config file used by disagg_client (#9783 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-08 20:34:55 -08:00
Balaram Buddharaju	bafb60c1bc	[None][chore] Fix tests failing on pre-merge 12/08 (#9819 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-08 20:08:52 -08:00
Bo Li	f2006a1f74	[https://nvbugs/5726066 ][infra] Waive timeout disaggregated/test_auto_scaling tests. (#9815 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-08 19:51:43 -08:00
JunyiXu-nv	f521f6d910	[None][fix] Fix unterminated process issue for RemoteOpenAIServer (#9490 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-09 11:15:40 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
yuanjingx87	390391ebf1	[None][infra] Correct the waived test names due to a merge conflict (#9803 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-12-09 09:48:21 +08:00
Chenghao Zhang	75f5446d67	[#9753 ][feat] AutoDeploy: Implement add rms_norm fusion (#9754 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-08 14:24:27 -08:00
Eran Geva	23cf72b0f8	[#8921 ][feat] Added symetric memory AllReduce strategy (#8919 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-08 13:12:56 -08:00
Yibin Li	faabc1a387	[TRTLLM-7967][chore] Add more tests (#9415 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-12-08 11:57:32 -08:00
Jhao-Ting Chen	0a09465089	[https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 11:16:05 -08:00
Frank	f6df9eb2a6	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
Lizhi Zhou	52f78e4000	[http://nvbugs/5649010 ][fix] fix test_auto_scaling.py::test_worker_restart timeout (#9775 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-08 03:26:01 -08:00
fredricz-20070104	96d9b67d65	[https://nvbugs/5527655 ][test] Add test case for RCCA 5527655 (#9511 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 01:27:13 -08:00
fredricz-20070104	ededeecb0f	[None][test] Add Kimi k2 WIDEEP perf and accuracy cases (#9686 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-08 01:25:07 -08:00
xinhe-nv	3f55c07223	[None][chore] Remove closed bugs (#9770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-07 22:51:55 -08:00
Li Min	a422d70be6	[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-08 13:28:11 +08:00
Fanrong Li	2f526583fb	[None][chore] Move the rocketkv e2e test to post-merge (#9768 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-08 13:22:16 +08:00
Emma Qiao	137713a869	[None][infra] Waive failed cases for main on 12/08 (#9773 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 20:18:29 -08:00
ruodil	d232709568	[https://nvbugs/5666804 ][test] only adding sampler config for limited models (#9512 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-07 19:40:29 -08:00
fredricz-20070104	9bfb6179ec	[https://nvbugs/5422621 ][test] Add GB 200 WIDEEP test case for RCCA 5422621 (#9506 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 10:41:40 +08:00
xxi	8e27ce7084	[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (#9645 )	2025-12-08 10:19:40 +08:00
Zheng Duan	4da0e1473c	[None][test] add ntp tolerance in time metrics verification (#9741 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-12-08 09:51:10 +08:00
chenfeiz0326	383178c00a	[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (#8800 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-08 09:00:44 +08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
Emma Qiao	7c6c493993	[None][infra] Waive failed cases for main branch on 12/07 (#9769 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 06:26:47 -08:00
JunyiXu-nv	b210f22c7e	[https://nvbugs/5703953 ][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-06 20:13:48 -08:00
Yan Chunwei	e4c707845f	[None][fix] enable hmac in RPC (#9745 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-07 08:24:46 +08:00
Jonas Li	2645a78f34	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 ) Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-06 02:24:51 -08:00
Enwei Zhu	7cd5a67e25	[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-05 22:08:52 -08:00
Mike Iovine	31ab367576	[None][chore] Waive flakey disagg tests (#9749 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-05 13:07:05 -08:00
jthomson04	299601aebf	[https://nvbugs/5670672 ][fix] Fix flaky KV connector tests (#9676 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-12-05 10:04:54 -08:00
Robin Kobus	eb0b426e5d	[None][refactor] Improve request processing function in sampler (#9671 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:41:49 +01:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00
yufeiwu-nv	68253d9d29	[https://nvbugs/5518713 ][test] Refactor core test lists by merging with llm_perf_cluster.yml (#9714 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-05 01:15:37 -08:00
Kaiyu Xie	e06c582648	[None] [tests] Unwaive EPLB tests (#9625 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-05 00:13:24 -08:00
gramnarayan	74df9b180b	[#9602 ][feat] AutoDeploy: Support TRTLLM Sampler (#9641 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 19:24:11 -08:00
Lizhi Zhou	dc766fc126	[https://nvbugs/5633340 ][fix] start disagg workers and servers on free ports (#9694 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:51:29 +08:00
Lizhi Zhou	0d0a16fff4	[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:44:16 +08:00
xinhe-nv	530af1a98e	[None][chore] Add failed cases into waives.txt (#9662 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-04 22:33:22 +08:00
Anthony Chang	60cdca3740	[None][fix] Recover TRTLLM MoE Perf for DEP (#9562 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-12-04 22:10:25 +08:00
Jin Li	e5d4305c04	[https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … (#9617 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 18:17:24 +08:00
ruodil	8a392af28f	[None][test] rename wide ep and disagg metric name in perf test (#9704 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-04 18:16:06 +08:00
Yan Chunwei	05058f5e2a	[None][ci] unwaive tests (#9651 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-04 15:06:07 +08:00
tcherckez-nvidia	f9aa86dbdd	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com>	2025-12-04 08:03:33 +02:00
JunyiXu-nv	6d2daec5d0	[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-04 13:49:40 +08:00
Tailing Yuan	4eed648e22	[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-04 13:41:15 +08:00
Jin Li	87e0c8a749	[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (#7838 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 13:32:11 +08:00
mpikulski	744f0eff1b	[TRTLLM-9522][fix] restore `trtllm-serve mm_embedding_serve` (#9669 )	2025-12-03 19:27:11 -08:00
Yiqing Yan	e31142202e	[TRTLLM-7181][infra] Generate test results when pytest timeout happens (#9396 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-12-04 10:05:38 +08:00
Wanli Jiang	4485e516a2	[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-04 06:47:32 +08:00
gramnarayan	098b9ff226	[#9147 ][feat] AutoDeploy: Draft Target Speculative Decoding (#9275 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 05:13:49 +08:00
Wei-Ming Chen	d9fba85396	[OMNIML-2932] [feat] nvfp4 awq support (#8698 ) Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>	2025-12-03 19:47:13 +02:00
Michal Guzek	4e5b10da48	[https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch (#8253 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-03 15:42:15 +01:00
Patrice Castonguay	ae8d8a266a	[https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests (#9637 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-03 22:18:36 +08:00
Guoming Zhang	79e872de31	[None][test] Update Qwen3-next accuracy testing by setting the cuda … (#9613 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-03 20:52:53 +08:00
JunyiXu-nv	743486b2ea	[TRTLLM-6842][feat] Support Response API for general purpose (#9392 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-03 16:49:26 +08:00
xinhe-nv	3a748b166b	[None][chore] Add failed cases into waives.txt (#9593 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-12-03 16:26:06 +08:00
fredricz-20070104	80ff9015ce	[https://nvbugs/5561153 ][test] Fix log error for perf test (#9622 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-03 15:27:13 +08:00
brb-nv	43f6ad7813	[https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism (#9647 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-03 15:13:59 +08:00
Bo Li	8b5ededc83	[TRTLLM-9391][chore] Automatically estimate required workspace. (#9535 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-03 12:49:38 +08:00
Suyog Gupta	93871d52b2	[None][chore] AutoDeploy update cuda stream manager for multi-device (#9575 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-12-02 20:43:14 -08:00
heyuhhh	a08eb81cce	[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (#9572 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-03 11:33:46 +08:00
yufeiwu-nv	21f2ba74e8	[None][test] Remove duplicate test cases (#9623 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-03 10:35:26 +08:00
brb-nv	55c7023c92	[None][chore] Waive test failing on pre-merge (#9638 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-03 07:31:10 +08:00
Grzegorz Kwasniewski	0a7a88e74e	[TRTLLM-8946][feat] Improved heuristics to detect shardable regions (#9200 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-02 22:08:19 +01:00
Patrice Castonguay	3991aa9c72	[https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up (#9598 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-02 12:48:53 -05:00
Neta Zmora	a560ba5546	[#9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels (#9551 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-03 01:39:38 +08:00
Shi Xiaowei	227d42e492	[https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity (#9549 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-03 01:17:03 +08:00
William Zhang	2dd3ebf037	[#9150 ][feat] Add code for nano v3 to custom implementation in AD (#9465 ) * Why? We would like to show an alternative to monkey-patching in AutoDeploy. * What? This commit builds on the existing custom model implementation for NemotronH and adds the bits relevant for MoE layers. Part of #9150. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-02 08:56:44 -08:00
Mike Iovine	d5b7f0c8ad	[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch (#8889 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-02 10:32:02 -05:00
Yan Chunwei	b86256eb54	[TRTLLM-9144][fix] enhance RPC robustness (#8711 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-12-02 21:37:59 +08:00
brb-nv	be48cdf1d1	[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite (#9597 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-02 20:10:07 +08:00
Emma Qiao	4a8766c11d	[None][infra] Remove an invalid test name in waives.txt (#9620 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-02 18:05:17 +08:00
mpikulski	84a1531594	[TRTLLM-9488][feat] use FlashInfer.sampling by default (#9545 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-12-02 16:29:55 +08:00
Emma Qiao	3e4f2388a9	[None][infra] Waive failed cases for main branch (#9615 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-02 15:48:27 +08:00
shuyixiong	1a2118b8fe	[https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view (#9576 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-12-02 15:41:32 +08:00
xinhe-nv	ad46d19027	[None][chore] Add failed cases into waives.txt (#9588 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-02 14:24:11 +08:00
ruodil	4586b5f42f	[https://nvbugs/5582091 ][test] increase warmup times in testing for multi-gpu cases (#9578 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-02 14:22:49 +08:00
Wanli Jiang	5657a00ec0	[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (#9261 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-02 13:40:20 +08:00
xinhe-nv	3911d0496e	[None][fix] Waive gb200 (#9580 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-02 12:09:21 +08:00
JunyiXu-nv	9a6df980cd	[https://nvbugs/5703953 ][fix] Use random port for disagg tests (#9582 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-02 11:40:14 +08:00
Iman Tabrizian	356a52edf5	[None][feat] Add support for KVCache reuse for DSv32 (#9383 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 11:14:30 +08:00
Shijie	dcf5c86720	[None][feat] Unify nvfp4 gemm backend (#8963 ) Signed-off-by: Shijie Wang <jaywan@nvidia.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Shijie <jaywan@nvidia.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-02 11:03:51 +08:00
Eran Geva	c9771ebb99	[#9198 ][feat] Refactor dist ops in AutoDeploy (#9301 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-02 02:36:32 +08:00
Venky	639c939a4f	[TRTC-1943][feat] Env vars override support in LLM API (#9104 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-12-01 10:04:49 -08:00
Stefan Niebler	f155812eb0	[TRTLLM-6756][feat] Add Beam Search to TorchSampler (#8509 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-01 18:48:04 +01:00
Yanchao Lu	7127c4407a	[None][test] [None][test] Waive main branch test failures 12/1 (#9566 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-01 21:54:53 +08:00
Shi Xiaowei	48b1d31895	[https://nvbugs/5651854 ][infra] Enable perf metrics during accuracy testing (#9140 )	2025-12-01 20:15:32 +08:00
alel	4107254c82	[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (#9428 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2025-12-01 18:10:45 +08:00
JadoTu	a92af27411	[None][chore] remove qwen3-next accuracy tests (#9534 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-01 11:49:37 +08:00
Pengbo Wang	aa3310f64f	[https://nvbugs/5503479 ][fix] Temporarily lower reference accuracy to stabilize CI (#9398 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-12-01 11:49:14 +08:00
Enwei Zhu	2e3ac3c48f	[https://nvbugs/5684703 ][fix] Unwaive disagg guided decoding test (#9466 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 11:39:40 +08:00
Li Min	1797e91dfd	[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. (#9543 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-01 10:19:36 +08:00
heyuhhh	6e470aab72	[None] [feat] Optimize the algorithm part of RocketKV (#9333 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-01 09:04:09 +08:00
xxi	c12e67bb66	[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (#9486 )	2025-12-01 08:37:07 +08:00
JunyiXu-nv	3f588198dc	[None][fix] Fix port conflict in disagg tests (#9474 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-30 17:33:22 +08:00
Emma Qiao	c927ccf510	[None][infra] Wiave failed tests for main branch on 11/30 (#9555 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-30 16:13:20 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
dominicshanshan	6345074686	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-11-29 21:48:48 +08:00
Grzegorz Kwasniewski	cff54fcae3	[#8948 ][feat] Support custom sharding config (#9143 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-11-29 05:28:05 +08:00
mpikulski	bc355eadf5	[TRTLLM-9488][fix] llmapi references (#9547 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-28 08:54:05 -08:00
dominicshanshan	70efa3ac43	[None][infra] Waive failed case in pre-merge on 11/28 (#9537 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-11-28 20:53:45 +08:00
mpikulski	e5f39ec7cf	[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option (#9454 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-28 13:00:39 +01:00
Emma Qiao	2d7421b314	[None][infra] Waive failed cases for main branch on 11/28 (#9539 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-28 17:19:55 +08:00
Liao Lanyu	bf84d9cea1	[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (#9533 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-11-28 14:52:05 +08:00

1 2 3 4 5 ...

2359 Commits