TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yechan Kim	8ba8699f66	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-12-15 20:05:20 -08:00
ChristinaZ	dff77efa2a	[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-12-15 19:59:08 -08:00
xinhe-nv	cdf56c278f	[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. (#9979 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-15 18:59:13 -08:00
Michal Guzek	e6187d8109	[https://nvbugs/5708810 ][fix] Fix TRTLLMSampler (#9710 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-15 23:26:52 +01:00
Patrice Castonguay	9ba14263db	[https://nvbugs/5673559 ][fix] Unwaiving disagg test for nvbug 5673559 (#9957 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-15 12:32:15 -05:00
Emma Qiao	d5d15c06df	[None][infra] Waive failed tests for main branch on 12/15 (#10001 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-16 01:29:43 +08:00
Kaiyu Xie	44b0f8c3ed	[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002 )	2025-12-15 08:52:52 -08:00
Wanli Jiang	3230fbe79a	[None][feat] Update reasoning parser for nano-v3 (#9944 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-15 05:39:37 -08:00
Yukun He	9e7182b603	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 ) Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL. * Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases. * Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability. * Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-15 21:08:53 +08:00
Bo Li	9eb5a229dd	[None][infra] Fully waive test_worker_restart test_disagg_server_restart. (#9988 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski	83885c69e7	[TRTLLM-9136][feat] 2D parallel EP TP support (#9459 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-15 09:52:29 +01:00
xinhe-nv	3c98b25005	[None][chore] Add failed cases into waives.txt (#9941 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-14 23:14:24 -08:00
JunyiXu-nv	af899d2fe7	[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-14 21:46:13 -08:00
Ziyi Xiong	f2aee0db03	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-15 13:28:54 +08:00
shuyixiong	25db9e7b3e	[https://nvbugs/5741060 ][chore] Waive all pg operator tests (#9991 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-14 21:24:43 -08:00
Balaram Buddharaju	dfc8799352	[https://nvbugs/5669114 ][fix] Switch to MMMU benchmark for Gemma3 27B (#9966 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-14 21:23:59 -08:00
Fanrong Li	8f144d9282	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-15 12:42:25 +08:00
QI JUN	b57650f1e6	[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 (#9934 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-14 19:21:54 -08:00
xxi	f5696df285	[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858 )	2025-12-15 10:47:15 +08:00
dominicshanshan	4bf42f8fa8	[https://nvbugs/5580297 ][fix] Skip capture request error test from Ray stage (#9947 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-15 10:03:16 +08:00
Simeng Liu	f21e2b3329	[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-15 08:42:30 +08:00
Emma Qiao	e0a4b72279	[None][infra] Waive failed tests for main branch on 12/14 (#9982 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-14 22:48:34 +08:00
Mike Iovine	96d654029d	[https://nvbugs/5666816 ][fix] Unwaive llama3 eagle3 test (#9964 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-14 15:07:35 +08:00
nvxuanyuc	a5a37227d6	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-12-14 10:47:24 +08:00
Mike Iovine	383b13e0e5	[None][feat] Implement sampling on 1-model EAGLE3 (#9885 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-13 07:38:22 -08:00
Yan Chunwei	85406f9dda	[https://nvbugs/5720482 ][fix] Fix test rpc streaming (#9902 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-13 01:14:43 -08:00
shuyixiong	8cbf2d958c	[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-13 01:02:11 -08:00
Balaram Buddharaju	6a6e41f802	[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:41 -08:00
bhsueh_NV	e49c70f6df	[None][feat] Support Mistral Large3 LLM part (#9820 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-13 11:44:27 +08:00
Balaram Buddharaju	461446045e	[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 16:49:25 -08:00
tburt-nv	6147452158	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-12-13 08:35:31 +08:00
Chuang Zhu	4cc4cbe926	[https://nvbugs/5716787 ][fix] terminate nixl running when exiting (#9785 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-12 11:15:02 -05:00
Chuang Zhu	9c59c9f920	[https://nvbugs/5643787 ][fix] remove the war path for notify to itself (#9834 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-12 11:10:05 -05:00
JunyiXu-nv	2fec53dfa5	[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-12 23:32:39 +08:00
Yihan Wang	9df4dad3b6	[None][fix] Introduce inline namespace to avoid symbol collision (#9541 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 23:32:15 +08:00
Balaram Buddharaju	af315d8ef1	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:05 +08:00
Lucas Liebenwein	e767fc649a	[None][feat] AutoDeploy: prepare_metadata revisited (#9764 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-12 20:14:14 +08:00
ruodil	9b3e5e90ee	[None][test] fix a typo in model name in script (#9867 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-12 17:35:55 +08:00
chenfeiz0326	61745f034a	[https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-12 17:16:50 +08:00
kris1025	2fc94e5dd7	[None][chore] unwaive qwen3 accuracy test (#9895 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-12-12 16:30:09 +08:00
Yihan Wang	711016c799	[https://nvbugs/5736923 ][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test (#9942 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 15:15:13 +08:00
Ivy Zhang	fded6c393d	[TRTLLM-9262][test] add groupgemm ada case for rcca (#9833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-12-12 13:23:33 +08:00
dominicshanshan	093465ed29	[https://nvbugs/5599176 ][fix] Unwaive fixed test for Ray (#9861 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-12 11:24:05 +08:00
xinhe-nv	e8efeb765d	[TRTLLM-9717][fix] fix multi nodes tests cases (#9736 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-12 10:14:23 +08:00
Venky	fd1270b9ab	[TRTC-43] [feat] Add config db and docs (#9420 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-12-12 04:00:03 +08:00
Simeng Liu	24f92721f2	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-12 02:29:30 +08:00
Erin	89dabf5aa1	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 ) Signed-off-by: Liwei Ma <liweim@nvidia.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-11 09:33:25 -08:00
JadoTu	02edb19f43	[None] [feat] add eos_token_id in generation_config to sampling params (#9514 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-12 00:52:03 +08:00
xxi	488d38f88d	[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772 )	2025-12-12 00:22:13 +08:00
Yan Chunwei	04a39a4e2b	[None][chore] enable test_ipc.py (#9865 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-11 17:47:14 +08:00
Zongfei Jing	c76b428e2e	[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-12-11 16:21:32 +08:00
JunyiXu-nv	454e7e59e5	[https://nvbugs/5718004 ][fix] Add warmup for cancellation test (#9860 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-11 12:20:33 +08:00
Bo Deng	c1d53ee43d	[https://nvbugs/5582258 ][fix] unwaive (#9650 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-12-10 19:18:30 -08:00
fredricz-20070104	341cb1a12c	[None][chore] Add GB300 support since it does not support segment (#9731 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-10 18:36:55 -08:00
Patrice Castonguay	2c0293c612	[https://nvbugs/5601682 ][fix] Unwaiving disagg test (#9627 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-10 13:42:26 -05:00
cheshirekow	2f030312a8	[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367 ) Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-12-10 21:01:19 +08:00
Yukun He	072f236002	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 ) Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-10 20:41:04 +08:00
dominicshanshan	0e78a4b244	[https://nvbugs/5702791 ][fix] Unwaive fixed test (#9844 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-10 14:01:44 +08:00
QI JUN	2c46126a93	[TRTLLM-9794][ci] move some deepseek test cases to gb200 (#9841 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 19:54:51 -08:00
zhanghaotong	36c9e7cfe6	[None][chore] Add unittest for otlp tracing (#8716 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-09 18:34:08 -08:00
dhansen-nvidia	2d33ae94d5	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-12-09 18:51:31 -05:00
Patrice Castonguay	414448bb37	[https://nvbugs/5719561 ][chore] Unwaive tests for nvbug 5719561 (#9801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 18:21:50 -05:00
Patrice Castonguay	ff0ef19ee9	[https://nvbugs/5688388 ][chore] Unwaiving fixed disagg test (#9800 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 16:51:46 -05:00
Patrice Castonguay	7d7d05d8db	[None][chore] Adding flaky auto scaling test to waives (#9851 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 15:05:19 -05:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Dom Brown	3156f2e852	[https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 (#9788 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-12-09 13:37:55 +00:00
Emma Qiao	75bc386b65	[None][infra] Waive failed cases for main branch on 12/09 (#9839 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-09 19:39:29 +08:00
QI JUN	58c29957d9	[TRTLLM-9794][ci] move qwen3-next test cases to gb200 (#9827 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 01:58:25 -08:00
Stefan Niebler	d600b9f851	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-09 10:44:01 +01:00
Robin Kobus	76f49c903b	[None][fix] Additional model outputs for pipeline parallelism (#9794 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-09 10:41:22 +01:00
yufeiwu-nv	fbcf03040f	[None][test] Refactor qa/llm_perf_nim.yml test list (#9700 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-08 22:00:43 -08:00
QI JUN	252769c930	[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 (#9817 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-08 21:51:30 -08:00
Shi Xiaowei	b050804b63	[TRTLLM-6537][infra] extend multi-gpu tests related file list (#9614 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-09 12:54:53 +08:00
JunyiXu-nv	90890785eb	[https://nvbugs/5722653 ][fix] Fix config file used by disagg_client (#9783 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-08 20:34:55 -08:00
Balaram Buddharaju	bafb60c1bc	[None][chore] Fix tests failing on pre-merge 12/08 (#9819 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-08 20:08:52 -08:00
Bo Li	f2006a1f74	[https://nvbugs/5726066 ][infra] Waive timeout disaggregated/test_auto_scaling tests. (#9815 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-08 19:51:43 -08:00
JunyiXu-nv	f521f6d910	[None][fix] Fix unterminated process issue for RemoteOpenAIServer (#9490 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-09 11:15:40 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
yuanjingx87	390391ebf1	[None][infra] Correct the waived test names due to a merge conflict (#9803 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-12-09 09:48:21 +08:00
Chenghao Zhang	75f5446d67	[#9753 ][feat] AutoDeploy: Implement add rms_norm fusion (#9754 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-08 14:24:27 -08:00
Eran Geva	23cf72b0f8	[#8921 ][feat] Added symetric memory AllReduce strategy (#8919 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-08 13:12:56 -08:00
Yibin Li	faabc1a387	[TRTLLM-7967][chore] Add more tests (#9415 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-12-08 11:57:32 -08:00
Jhao-Ting Chen	0a09465089	[https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 11:16:05 -08:00
Frank	f6df9eb2a6	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
Lizhi Zhou	52f78e4000	[http://nvbugs/5649010 ][fix] fix test_auto_scaling.py::test_worker_restart timeout (#9775 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-08 03:26:01 -08:00
fredricz-20070104	96d9b67d65	[https://nvbugs/5527655 ][test] Add test case for RCCA 5527655 (#9511 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 01:27:13 -08:00
fredricz-20070104	ededeecb0f	[None][test] Add Kimi k2 WIDEEP perf and accuracy cases (#9686 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-08 01:25:07 -08:00
xinhe-nv	3f55c07223	[None][chore] Remove closed bugs (#9770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-07 22:51:55 -08:00
Li Min	a422d70be6	[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-08 13:28:11 +08:00
Fanrong Li	2f526583fb	[None][chore] Move the rocketkv e2e test to post-merge (#9768 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-08 13:22:16 +08:00
Emma Qiao	137713a869	[None][infra] Waive failed cases for main on 12/08 (#9773 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 20:18:29 -08:00
ruodil	d232709568	[https://nvbugs/5666804 ][test] only adding sampler config for limited models (#9512 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-07 19:40:29 -08:00
fredricz-20070104	9bfb6179ec	[https://nvbugs/5422621 ][test] Add GB 200 WIDEEP test case for RCCA 5422621 (#9506 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 10:41:40 +08:00
xxi	8e27ce7084	[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (#9645 )	2025-12-08 10:19:40 +08:00
Zheng Duan	4da0e1473c	[None][test] add ntp tolerance in time metrics verification (#9741 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-12-08 09:51:10 +08:00
chenfeiz0326	383178c00a	[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (#8800 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-08 09:00:44 +08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
Emma Qiao	7c6c493993	[None][infra] Waive failed cases for main branch on 12/07 (#9769 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 06:26:47 -08:00
JunyiXu-nv	b210f22c7e	[https://nvbugs/5703953 ][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-06 20:13:48 -08:00
Yan Chunwei	e4c707845f	[None][fix] enable hmac in RPC (#9745 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-07 08:24:46 +08:00
Jonas Li	2645a78f34	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 ) Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-06 02:24:51 -08:00
Enwei Zhu	7cd5a67e25	[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-05 22:08:52 -08:00
Mike Iovine	31ab367576	[None][chore] Waive flakey disagg tests (#9749 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-05 13:07:05 -08:00
jthomson04	299601aebf	[https://nvbugs/5670672 ][fix] Fix flaky KV connector tests (#9676 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-12-05 10:04:54 -08:00
Robin Kobus	eb0b426e5d	[None][refactor] Improve request processing function in sampler (#9671 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:41:49 +01:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00
yufeiwu-nv	68253d9d29	[https://nvbugs/5518713 ][test] Refactor core test lists by merging with llm_perf_cluster.yml (#9714 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-05 01:15:37 -08:00
Kaiyu Xie	e06c582648	[None] [tests] Unwaive EPLB tests (#9625 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-05 00:13:24 -08:00
gramnarayan	74df9b180b	[#9602 ][feat] AutoDeploy: Support TRTLLM Sampler (#9641 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 19:24:11 -08:00
Lizhi Zhou	dc766fc126	[https://nvbugs/5633340 ][fix] start disagg workers and servers on free ports (#9694 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:51:29 +08:00
Lizhi Zhou	0d0a16fff4	[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:44:16 +08:00
xinhe-nv	530af1a98e	[None][chore] Add failed cases into waives.txt (#9662 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-04 22:33:22 +08:00
Anthony Chang	60cdca3740	[None][fix] Recover TRTLLM MoE Perf for DEP (#9562 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-12-04 22:10:25 +08:00
Jin Li	e5d4305c04	[https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … (#9617 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 18:17:24 +08:00
ruodil	8a392af28f	[None][test] rename wide ep and disagg metric name in perf test (#9704 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-04 18:16:06 +08:00
Yan Chunwei	05058f5e2a	[None][ci] unwaive tests (#9651 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-04 15:06:07 +08:00
tcherckez-nvidia	f9aa86dbdd	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com>	2025-12-04 08:03:33 +02:00
JunyiXu-nv	6d2daec5d0	[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-04 13:49:40 +08:00
Tailing Yuan	4eed648e22	[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-04 13:41:15 +08:00
Jin Li	87e0c8a749	[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (#7838 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 13:32:11 +08:00
mpikulski	744f0eff1b	[TRTLLM-9522][fix] restore `trtllm-serve mm_embedding_serve` (#9669 )	2025-12-03 19:27:11 -08:00
Yiqing Yan	e31142202e	[TRTLLM-7181][infra] Generate test results when pytest timeout happens (#9396 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-12-04 10:05:38 +08:00
Wanli Jiang	4485e516a2	[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-04 06:47:32 +08:00
gramnarayan	098b9ff226	[#9147 ][feat] AutoDeploy: Draft Target Speculative Decoding (#9275 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 05:13:49 +08:00
Wei-Ming Chen	d9fba85396	[OMNIML-2932] [feat] nvfp4 awq support (#8698 ) Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>	2025-12-03 19:47:13 +02:00
Michal Guzek	4e5b10da48	[https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch (#8253 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-03 15:42:15 +01:00
Patrice Castonguay	ae8d8a266a	[https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests (#9637 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-03 22:18:36 +08:00
Guoming Zhang	79e872de31	[None][test] Update Qwen3-next accuracy testing by setting the cuda … (#9613 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-03 20:52:53 +08:00
JunyiXu-nv	743486b2ea	[TRTLLM-6842][feat] Support Response API for general purpose (#9392 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-03 16:49:26 +08:00
xinhe-nv	3a748b166b	[None][chore] Add failed cases into waives.txt (#9593 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-12-03 16:26:06 +08:00
fredricz-20070104	80ff9015ce	[https://nvbugs/5561153 ][test] Fix log error for perf test (#9622 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-03 15:27:13 +08:00
brb-nv	43f6ad7813	[https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism (#9647 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-03 15:13:59 +08:00
Bo Li	8b5ededc83	[TRTLLM-9391][chore] Automatically estimate required workspace. (#9535 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-03 12:49:38 +08:00
Suyog Gupta	93871d52b2	[None][chore] AutoDeploy update cuda stream manager for multi-device (#9575 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-12-02 20:43:14 -08:00
heyuhhh	a08eb81cce	[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (#9572 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-03 11:33:46 +08:00
yufeiwu-nv	21f2ba74e8	[None][test] Remove duplicate test cases (#9623 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-03 10:35:26 +08:00
brb-nv	55c7023c92	[None][chore] Waive test failing on pre-merge (#9638 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-03 07:31:10 +08:00
Grzegorz Kwasniewski	0a7a88e74e	[TRTLLM-8946][feat] Improved heuristics to detect shardable regions (#9200 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-02 22:08:19 +01:00
Patrice Castonguay	3991aa9c72	[https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up (#9598 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-02 12:48:53 -05:00
Neta Zmora	a560ba5546	[#9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels (#9551 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-03 01:39:38 +08:00
Shi Xiaowei	227d42e492	[https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity (#9549 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-03 01:17:03 +08:00
William Zhang	2dd3ebf037	[#9150 ][feat] Add code for nano v3 to custom implementation in AD (#9465 ) * Why? We would like to show an alternative to monkey-patching in AutoDeploy. * What? This commit builds on the existing custom model implementation for NemotronH and adds the bits relevant for MoE layers. Part of #9150. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-02 08:56:44 -08:00
Mike Iovine	d5b7f0c8ad	[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch (#8889 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-02 10:32:02 -05:00
Yan Chunwei	b86256eb54	[TRTLLM-9144][fix] enhance RPC robustness (#8711 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-12-02 21:37:59 +08:00
brb-nv	be48cdf1d1	[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite (#9597 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-02 20:10:07 +08:00
Emma Qiao	4a8766c11d	[None][infra] Remove an invalid test name in waives.txt (#9620 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-02 18:05:17 +08:00
mpikulski	84a1531594	[TRTLLM-9488][feat] use FlashInfer.sampling by default (#9545 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-12-02 16:29:55 +08:00
Emma Qiao	3e4f2388a9	[None][infra] Waive failed cases for main branch (#9615 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-02 15:48:27 +08:00
shuyixiong	1a2118b8fe	[https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view (#9576 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-12-02 15:41:32 +08:00
xinhe-nv	ad46d19027	[None][chore] Add failed cases into waives.txt (#9588 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-02 14:24:11 +08:00

1 2 3 4 5 ...

2387 Commits