TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 11:11:36 +08:00

Author	SHA1	Message	Date
Mike Iovine	91ff46d418	[https://nvbugs/5745152 ][fix] Unwaive gpt oss spec decode test (#10370 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 16:06:58 -05:00
Mike Iovine	7a2dab8e85	[https://nvbugs/5695984 ][fix] Unwaive llama3 eagle test (#10092 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 16:03:35 -05:00
Yan Chunwei	6b71b03947	[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution (#10400 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2026-01-05 13:58:03 -05:00
Mike Iovine	db2614ef10	[https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case (#10385 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 17:20:14 +01:00
Gal Hubara-Agam	e98c27ee4f	[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-05 18:17:27 +02:00
Anthony Chang	225d3a9001	[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2026-01-05 17:16:12 +01:00
Balaram Buddharaju	a792c23dcf	[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-05 20:08:03 +08:00
xinhe-nv	b1733d56f6	[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests (#10357 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-05 05:15:52 -05:00
Fanrong Li	4931c5eb3a	[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 16:43:42 +08:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
HuiGao-NV	2f768b76f8	[https://nvbugs/5715568 ][fix] Force release torch memory when LLM is destroyed (#10314 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-05 15:30:18 +08:00
Emma Qiao	c63fad7d96	[None][infra] Waive failed cases again on 1/5 (#10403 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-05 02:12:16 -05:00
Yihan Wang	e7a4486294	[https://nvbugs/5752521 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py (#10227 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-05 14:37:05 +08:00
Yukun He	0937df2c68	[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 13:44:09 +08:00
Emma Qiao	5a8bfcbb50	[None][infra]Waive failed cases in post-merge on 1/5 (#10399 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-05 12:30:10 +08:00
Tailing Yuan	a7fe043b13	[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-05 11:23:04 +08:00
Yuxian Qiu	5773a4d775	[https://nvbugs/5701425 ][chore] Unwaive tests. (#10269 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-05 09:54:26 +08:00
Fanrong Li	b5a1e10bc0	[https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata (#10393 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 09:43:44 +08:00
Wanli Jiang	da0830670a	[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus (#10234 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-05 09:41:49 +08:00
Lizhi Zhou	82c1ba84a7	[https://nvbugs/5649010 ][fix] use 0 port as arbitrary port when disagg service discovery is enabled (#10383 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-05 09:40:40 +08:00
Eran Geva	e2f5455533	[#8391 ][chore] added deepseek_r1_distill_qwen_32b AutoDeploy perf test to L0 (#10377 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-04 20:35:52 +02:00
chenfeiz0326	a65b0d4efa	[None][fix] Decrease Pre Merge Perf Tests (#10390 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 12:21:34 -05:00
Yanchao Lu	c4f27fa4c0	[None][ci] Some tweaks for the CI pipeline (#10359 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 11:10:47 -05:00
dongfengy	afc533193d	[None][feat] Support nvfp4 for gptoss (#8956 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-04 08:57:44 -05:00
Jaedeok Kim	a4dcc6a711	[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager (#10330 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-04 06:07:30 -05:00
Yuxian Qiu	6ba04eba06	[https://nvbugs/5748683 ][fix] Use get_free_port_in_ci to avoid port conflict. (#10392 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-04 19:04:58 +08:00
Yanchao Lu	c0b3c2b919	[None][ci] Remove an invalid test waive Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-03 23:34:13 +08:00
Emma Qiao	865992b86b	[None][infra] Waive failed cases on 1/3 (#10391 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-03 05:54:09 -05:00
Izzy Putterman	bdf6953ddc	[None][feat] Eagle: MLA Based Eagle (#9677 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-02 13:45:07 -05:00
Gal Hubara-Agam	f3dd6da080	[#10056 ][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test (#10308 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-02 11:20:19 +02:00
chenfeiz0326	5e0e48144f	[None][fix] Minor updates on Perf Test System (#10375 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-02 17:17:42 +08:00
fredricz-20070104	f631b25c85	[None][test] Unified slurm extra args management and session collection logic (#10332 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-01 21:10:51 -05:00
Balaram Buddharaju	4a1b742aa0	[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 13:42:53 -05:00
Balaram Buddharaju	9f5b750a93	[None][chore] Waive tests blocking pre-merge 12/31 (#10373 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 03:00:24 -05:00
Balaram Buddharaju	0b75340223	[https://nvbugs/5744427 ][fix] Make Gemma3 multimodal test fp8 (#10368 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 01:11:34 -05:00
Yuxian Qiu	ff836d4f41	[https://nvbugs/5740359 ][chore] Unwaive tests. (#10260 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-01 09:53:34 +08:00
Lucas Liebenwein	1bbe71b3ed	[#10244 ][feat] AutoDeploy: separate prefill/decode in flashinfer (#10252 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-31 17:01:24 -05:00
Simeng Liu	84d107b2f0	[https://nvbugs/5717993 ][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. (#10060 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-31 09:22:54 -08:00
xinhe-nv	0d2e2718ce	[None][chore] Add failed cases into waives.txt (#10354 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 09:30:22 -05:00
chenfeiz0326	a23c6f1092	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-31 21:44:59 +08:00
tcherckez-nvidia	464847c6be	[#9717 ][chore] Standardize MoE weights interface (#10295 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-31 07:37:18 -05:00
Jin Li	ef1d4a40b5	[https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… (#10212 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 06:21:36 -05:00
Emma Qiao	d944430f96	[None][infra] Waive failed cases on 12/31 (#10353 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-31 17:39:49 +08:00
Necofish	73870ae4ad	[None][feat] support Qwen3-VL dense model in pytorch backend (#9060 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-31 17:54:26 +09:00
xinhe-nv	827d12caaf	[https://nvbugs/5558516 ][test] add disaggregated stress test (#9354 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 16:47:36 +08:00
Yuxian Qiu	910a633066	[https://nvbugs/5774869 ][chore] waive tests. (#10356 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 03:00:52 -05:00
xinhe-nv	1e9c153b4c	[None][fix] disable thread leak check for kimi (#10337 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 01:31:37 -05:00
xinhe-nv	6c1abf2d45	[None][chore] Add failed cases into waives.txt (#10344 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 00:11:54 -05:00
Jin Li	34c2fd50a9	[https://nvbugs/5707359 ][fix] Unwaive OOM case that should be fixed by #9446 (#10334 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 10:41:39 +08:00
Yuxian Qiu	ec8a388c25	[https://nvbugs/5769890 ][fix] Import get_free_port. (#10341 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 09:47:27 +08:00
Eran Geva	74832a1895	[https://nvbugs/5766986 ][fix] fixed the shard_all_unprocessed default value to align with the default.yml (#10271 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-30 08:54:13 -05:00
Bo Li	1f0365da36	[None][infra] Add LongBenchV1 to trtllm-eval. (#10265 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-30 21:39:34 +08:00
Emma Qiao	6732c76414	[None][infra] Waive failed cases for main on 12/30 (#10338 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 05:17:43 -05:00
Emma Qiao	fb05cd769a	[None][infra] Enable single-gpu CI on spark (#9304 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-30 17:22:14 +08:00
Emma Qiao	cce7247815	[https://nvbugs/5594703 ][infra] Unwaive the failed case to test (#10275 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 16:38:54 +08:00
xinhe-nv	6accdbc6a6	[None][chore] Add failed cases into waives.txt (#10302 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 03:11:52 -05:00
ruodil	0f4ed90560	[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-30 02:39:50 -05:00
xinhe-nv	3e0344a53d	[None][chore] Add failed cases into waives.txt (#10301 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 14:04:28 +08:00
xinhe-nv	48fee8d0f6	[None][chore] Add failed cases into waives.txt (#10321 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 00:11:49 -05:00
Emma Qiao	f396ad83b0	[None][infra] Remove duplicates in waives.txt (#10333 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-29 22:32:52 -05:00
Balaram Buddharaju	4944192eae	[None][chore] Waive tests failing in pre-merge 12/28 (#10311 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-29 20:53:49 -05:00
Neta Zmora	966231d29c	[#9626 ][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels (#10304 ) Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused. Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128, so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-29 23:18:15 +02:00
Yueh-Ting (eop) Chen	9cee32ab39	[https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager (#10183 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-12-29 14:29:14 +08:00
Yanchao Lu	2f8d6d25a8	[None][ci] Waive an intermittent test hang case (#10324 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-29 13:04:31 +08:00
Yanchao Lu	270be801aa	[None][ci] Move remaining DGX-B200 tests to LBD (#9876 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-28 13:55:39 +08:00
JunyiXu-nv	55bc6a5ff8	[https://nvbugs/5753250 ][fix] Fix undefined local variable in responses utils (#10154 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-28 06:59:32 +08:00
shivghai	ee07a7c55e	[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 (#9961 ) Signed-off-by: Shiv Ghai <8965168+shivghai@users.noreply.github.com>	2025-12-27 11:50:59 -08:00
Guoming Zhang	93ac0bc1dc	[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… (#10229 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-27 22:48:10 +08:00
Jin Li	c04563657e	[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-27 00:07:20 +08:00
chenfeiz0326	d70aeddc7f	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-26 22:50:53 +08:00
Pengyun Lin	684b37df02	[https://nvbugs/5747938 ][fix] Use local tokenizer (#10230 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 22:08:10 +08:00
Pengyun Lin	c5b0f9e436	[https://nvbugs/5633700 ][fix] Cache tiktoken vocab for gpt-oss (#10219 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 18:39:03 +08:00
dongfengy	bfc591994c	[https://nvbugs/5745152 ][fix] Fix some GPTOSS test setups (#10085 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-12-26 17:52:40 +08:00
Neta Zmora	f3f02315df	[None][chore]: small refactoring to auto-deploy MoE operator (#10300 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-25 12:27:11 -05:00
bhsueh_NV	db3430f589	[None][feat] Support VLM part for Mistral Large 3 (#10188 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-25 11:20:58 -05:00
Ziyi Xiong	d8b5aeb061	[https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens (#10160 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-25 09:13:51 -05:00
ZhichenJiang	46e4af5688	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 09:04:20 -05:00
Lizhi Zhou	fe12faef81	[https://nvbugs/5752516 ][chore] unwaive test; fix port conflicts in CI (#10152 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-25 08:16:09 -05:00
Emma Qiao	0ecdb69b93	[None][infra] Waive failed tests for main on 12/25 (#10298 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-25 05:22:39 -05:00
Jie Li	83e02ee335	[None][chore] Remove NIM TRT-Backend Test Lists (#10232 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2025-12-25 04:01:51 -05:00
Enwei Zhu	182b3eb633	[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout (#10293 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 02:35:18 -05:00
Gabriel Wu	1d01214ff0	[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>	2025-12-25 14:52:52 +08:00
xinhe-nv	4ae6f6a46c	[None][chore] Add failed cases into waives.txt (#10249 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-25 01:26:21 -05:00
Venky	c059e6caa1	[TRTC-121] [feat] Add recipe selector UI to complement the recipe database (#10125 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-12-24 23:56:54 -05:00
gramnarayan	a9eb5afc9f	[#9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding (#9869 ) Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-24 23:30:42 -05:00
Emma Qiao	16fd781e42	[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge (#9897 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-24 21:45:33 -05:00
Neta Zmora	c4b36d31ff	[#10137 ][feat] AutoDeploy FP8 MoE refactor (#10138 ) The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time. The Cutlass MoE kernel is used thru a trtllm torch operator. Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-24 18:58:10 +02:00
Stanley Sun	ddac4d7379	[None][test] Add disag-serving auto scaling qa test (#10262 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-12-24 08:43:47 -05:00
shuyixiong	f4f0fe85e9	[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-24 15:27:01 +08:00
xinhe-nv	534700ecd9	[None][chore] Add failed cases into waives.txt (#10240 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-24 02:21:50 -05:00
Fanrong Li	156f6453dc	[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 (#10226 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-24 14:39:12 +08:00
Emma Qiao	7b84e48e0f	[None][infra] Waive failed cases om 12/24 (#10257 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 22:49:57 -05:00
xinhe-nv	fc1f77eafc	[None][chore] Add failed cases into waives.txt (#10204 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2025-12-24 10:37:23 +08:00
Balaram Buddharaju	8c1cfc872b	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 18:14:30 -08:00
Jhao-Ting Chen	92d90fa29a	[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS (#10018 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski	0027a01ad5	[https://nvbugs/5680312 ][fix] Updated test waiving (#9630 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 09:38:12 -08:00
Emma Qiao	984c20e0b2	[None][infra] Waive failed cases on 12/23 (#10236 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 08:48:54 -05:00
dongfengy	e284d0bf80	[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests (#10209 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 07:43:13 -05:00
Yukun He	522f1d2bc3	[https://nvbugs/5764627 ][chore] waive the time-out test (#10222 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-23 16:36:06 +08:00
Balaram Buddharaju	f2e00a75de	[None][chore] Remove helix test from rtx test list (#10224 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 03:07:37 -05:00
Shiyu Li	3ddc9d2b48	[https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-12-23 16:07:07 +08:00
chenfeiz0326	48c875f8ea	[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test (#9990 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-23 16:02:38 +08:00
Bo Li	cc1323be24	[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. (#10197 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-23 02:13:37 -05:00
Chuang Zhu	53db3b2612	[https://nvbugs/5741884 ][fix] unwaive disagg sampler (#10189 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-23 14:38:07 +08:00
xinhe-nv	77b591f73b	[None][chore] Add failed cases into waives.txt (#10177 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <lijie@nvidia.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-23 13:43:50 +08:00
Harshini Komali	d691371eaf	[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310 ) Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-23 13:25:55 +08:00
Pamela Peng	5bc7ffe379	[None][test] Add qa tests for RTX 6K (#10210 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-12-22 22:47:09 -05:00
fredricz-20070104	621156ad44	[None][chore] Fix GB300 support issues (#10196 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-23 10:42:41 +08:00
Yuxian Qiu	696f754ef4	[None][fix] avoid implicit cudaStreamSynchronize in sample_async. (#10120 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-23 10:15:40 +08:00
Fanrong Li	0d2500c631	[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser (#10126 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-23 08:46:47 +08:00
tcherckez-nvidia	12e1cb8d7e	[#9717 ][chore] Refactor MoE code to use enums (#9910 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-22 15:14:56 -05:00
JunyiXu-nv	aaa87abf41	[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-22 11:33:34 -05:00
Emma Qiao	ba14a9308e	[None][infra] Waive failed cases on 12/22 (#10200 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 00:05:45 +08:00
Pengyun Lin	0f308e95f9	[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend (#9911 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-22 21:37:22 +08:00
William Zhang	a6a88985cf	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-22 06:32:49 -05:00
Bo Li	472fe497dc	[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-22 05:57:12 -05:00
Yan Chunwei	ea6cd76c55	[None][refactor] simplify get_stats and get_kvcache_events with rpc (#9980 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-22 18:23:43 +08:00
Perkz Zheng	c87f1a6b39	[https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs (#10089 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-22 04:45:33 -05:00
shuyixiong	9e9523c3cc	[https://nvbugs/5762016 ][chore] Skip a ray test (#10194 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-22 17:06:19 +08:00
xinhe-nv	d30ee8101e	[None][chore] Remove closed bugs (#10182 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-22 01:58:17 -05:00
Yuxian Qiu	237fd0eae4	[https://nvbugs/5666821 ][chore] unwaive tests. (#9958 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 11:39:45 +08:00
Jin Li	066b653940	[TRTLLM-9880][feat] Include torch compile tests in QA test list (#10149 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-22 10:37:09 +08:00
Yuxian Qiu	2f139ee07e	[https://nvbugs/5701445 ][chore] unwaive test. (#9949 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 10:12:54 +08:00
Chuang Zhu	914dd39127	[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test (#9735 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-22 09:29:24 +08:00
dominicshanshan	d274a4c5d3	[https://nvbugs/5701457 ][fix] Unwaive ray test. (#10175 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-22 09:25:58 +08:00
Enwei Zhu	5549067966	[None][ci] Waive GPTOSS test case (#10155 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-22 08:50:44 +08:00
Balaram Buddharaju	5266475014	[None][feat] Cudagraph updates for helix parallelism (#10141 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 15:21:52 -05:00
shuyixiong	4fc6036276	[https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor (#10147 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-21 11:47:20 -05:00
bhsueh_NV	cd4b4f43fa	[None][feat] Support Eagle3 on Mistral Large3 (#9971 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-21 10:25:45 -05:00
Emma Qiao	aa5dbb7ca5	[None][infra] Waive failed tests for main branch on 12/21 (#10184 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-21 22:23:46 +08:00
xxi	5ae154022a	[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… (#10067 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-12-21 08:14:50 -05:00
Eran Geva	b15f987972	[None][chore] removed duplicated test from l0_b200.yml (#10090 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-21 11:34:01 +02:00
Bo Li	a66eeab537	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-12-21 02:52:42 -05:00
Balaram Buddharaju	dcd3f7b5ea	[https://nvbugs/5744427 ][fix] Fix accuracy test OOM (#10173 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 02:03:38 -05:00
Bo Li	77e37d9dd0	[https://nvbugs/5753250 ][infra] Further waive all tests in _test_openai_responses.py (#10176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-20 10:25:14 -05:00
Enwei Zhu	2ce785f39a	[https://nvbugs/5643631 ][fix] Fix hostfunc seg fault (#10028 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-20 07:58:43 -05:00
Yuxian Qiu	3b3069b390	[https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. (#10121 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-20 09:42:07 +08:00
Anish Shanbhag	7c82605327	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
Balaram Buddharaju	bee9051484	[None][chore] Waive timing out pre-merge test (#10167 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-19 17:56:33 -05:00
Gal Hubara-Agam	20b69a982a	[#10056 ][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 (#10131 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-19 13:28:42 -08:00
Chang Liu	5489d188a4	[None][fix] Revert the change and remove device count guard for DSv32 (#9631 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-12-19 15:00:55 -05:00
longcheng-nv	b882393d69	[https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027 ) Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com> Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-12-19 14:58:01 -05:00
Venky	dfa11d810e	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
JunyiXu-nv	7b71ff6b8a	[https://nvbugs/5722653 ][fix] Unwaive fixed test (#10157 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-19 11:19:20 -05:00
xxi	27e49e2904	[None][fix] waive the failed test test_service_discovery[etcd-load_ba… (#10161 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-12-19 06:14:26 -08:00
xinhe-nv	7b51e3cedb	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10129 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-19 17:55:17 +08:00
Emma Qiao	dd8ce68c94	[None][infra] Update waive and waive failed tests for main branch on 12/19 (#10151 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-19 01:20:42 -08:00
Pengyun Lin	ac03915dc3	[TRTLLM-9604][feat] DS R1 & V3.1 tool parser (#10010 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-19 17:20:03 +08:00
Chang Liu	31bc14b350	[TRTLLM-9654][feat] Support DeepSeek-V32 chat template (#9814 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-12-19 17:05:38 +08:00
yufeiwu-nv	52cee573ad	[TRTLLM-8830][test] Overlap scheduler enhancement perf test: Add qwen3_0,8b and llama3.1 test cases (#10114 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-19 17:01:52 +08:00
xinhe-nv	cb0444b1b5	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10132 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-19 16:07:56 +08:00
JunyiXu-nv	356ad4fe3a	[https://nvbugs/5722653 ][fix] Address port conflict by assigning different port section in the same node. (#10035 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-19 15:34:04 +08:00
William Zhang	478b6b20a1	[#9230 ][refactor] Replace nemotron patches with custom model implementation (#9751 ) [#9230][refactor] Replace nemotron patches with custom model implementation * Why? Patching for nemotron H models was growing out of hand, and made certain optimizations more complex than they needed to be. * What? This commit finally gets rid of them, and replaces them with the custom model implementation in `modeling_nemotron_h.py`. Closes #9230 Closes NvBug 5747867 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-18 19:36:27 -08:00
Balaram Buddharaju	72c5480dfb	[None][chore] Waive test blocking pre-merge 12/18 (#10145 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-18 19:12:05 -08:00
Ivy Zhang	9aa40871c2	[TRTLLM-9840][test] switch ucx backend to default backend (#10101 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-12-18 18:54:15 -08:00
Wangjue Yao	9f283f330b	[None][feat] Support Mooncake transfer engine as a cache transceiver backend (#8309 ) Signed-off-by: wjueyao <wyao123@terpmail.umd.edu> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-19 10:09:51 +08:00
Chuang Zhu	e0b2a94309	[None][fix] Fix ready signal in NIXL backend (#10000 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-19 09:43:40 +08:00
Yukun He	bd5b3c2ac0	[https://nvbugs/5721912 ][chore] Unwaive the test (#10108 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-19 09:12:25 +08:00
Anish Shanbhag	91a9ae42d2	[TRTC-71][feat] Add regression testing for config database (#9832 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-12-18 16:15:38 -08:00
Balaram Buddharaju	799a2ae311	[https://nvbugs/5741331 ][fix] Fix helix accuracy test (#10021 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-18 15:27:53 -08:00
Chang Liu	a97e411b44	[https://nvbugs/5747911 ][fix] Use offline data path for the unit test of mmencoder server (#10135 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-12-18 15:19:23 -08:00
Lizhi Zhou	f02782a6f2	[https://nvbugs/5726066 ][fix] fix auto-scaling related failures (#9845 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com>	2025-12-18 16:37:48 -05:00
CarstyYou	0b279f4ad4	[https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 (#9687 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-12-18 22:57:20 +08:00
ZhichenJiang	4e55b83101	[None][perf] Add more optimization options for MOE CuteDSL finalized kernel (#10042 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>	2025-12-18 22:49:28 +08:00
Bo Li	9d7e038bcb	[https://nvbugs/5753250 ][infra] Waive _test_openai_responses. (#10110 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-18 00:15:06 -08:00
Emma Qiao	33a90f2dd2	[None][infra] Waive failed cases for main branch on 12/18 (#10105 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-17 21:35:45 -08:00
Yuxian Qiu	bec864a78c	[None][fix] avoid ID conversion for non enable_configurable_moe cases. (#10003 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-18 13:29:52 +08:00
Wanli Jiang	601c29ca73	[https://nvbugs/5721644 ][fix] Update tests for nemotron_h (#9993 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-18 12:38:02 +08:00
Lucas Liebenwein	76ec820465	[#7532 ][feat] AutoDeploy: gather logits before lm head (#9962 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-17 19:50:13 -08:00
xinhe-nv	4a98f190a8	[None][chore] Add failed cases into waives.txt (#10025 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-17 19:13:52 -08:00
xinhe-nv	c1cfb61b1b	[TRTLLM-9381][feat] Add kimi k2 fp4 tests (#9906 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-17 18:15:27 -08:00
Chenghao Zhang	22c6e8a424	[None][fix] Autodeploy: fix some legacy flashinfer attention test errors (#9928 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-17 12:27:22 -08:00
yufeiwu-nv	5d71f662c3	[https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-17 13:37:25 +08:00
Emma Qiao	0dbf3948cc	[None][infra] Waive failed tests due to llm model files (#10068 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-16 20:12:57 -08:00
JunyiXu-nv	6649c3743c	[https://nvbugs/5635153 ][chore] Remove responses tests from waive list (#10026 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-17 11:22:02 +08:00
shuyixiong	26fb063076	[https://nvbugs/5741060 ][fix] Fix pg op test (#9989 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-17 09:44:25 +08:00
Aurelien Chartier	7175d89b48	[None][fix] Fix iteration stats for spec-dec (#9855 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-12-16 14:11:38 -08:00
Lizhi Zhou	bd13957e70	[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-16 05:16:32 -08:00
Enwei Zhu	609d1d0383	[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM (#10008 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-16 04:06:49 -08:00
Emma Qiao	12727ebd7f	[None][infra] Waive failed test for main branch on 12/16 (#10029 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-16 02:54:32 -08:00
Wanli Jiang	8af51211c1	[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass (#9358 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-16 12:41:17 +08:00
Eran Geva	ce7a42f4cf	[https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test (#9983 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-15 20:30:24 -08:00
Yechan Kim	8ba8699f66	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-12-15 20:05:20 -08:00
ChristinaZ	dff77efa2a	[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-12-15 19:59:08 -08:00
xinhe-nv	cdf56c278f	[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. (#9979 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-15 18:59:13 -08:00
Michal Guzek	e6187d8109	[https://nvbugs/5708810 ][fix] Fix TRTLLMSampler (#9710 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-15 23:26:52 +01:00
Patrice Castonguay	9ba14263db	[https://nvbugs/5673559 ][fix] Unwaiving disagg test for nvbug 5673559 (#9957 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-15 12:32:15 -05:00
Emma Qiao	d5d15c06df	[None][infra] Waive failed tests for main branch on 12/15 (#10001 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-16 01:29:43 +08:00
Kaiyu Xie	44b0f8c3ed	[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002 )	2025-12-15 08:52:52 -08:00
Wanli Jiang	3230fbe79a	[None][feat] Update reasoning parser for nano-v3 (#9944 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-15 05:39:37 -08:00
Yukun He	9e7182b603	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 ) Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL. * Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases. * Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability. * Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-15 21:08:53 +08:00
Bo Li	9eb5a229dd	[None][infra] Fully waive test_worker_restart test_disagg_server_restart. (#9988 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski	83885c69e7	[TRTLLM-9136][feat] 2D parallel EP TP support (#9459 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-15 09:52:29 +01:00
xinhe-nv	3c98b25005	[None][chore] Add failed cases into waives.txt (#9941 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-14 23:14:24 -08:00
JunyiXu-nv	af899d2fe7	[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-14 21:46:13 -08:00
Ziyi Xiong	f2aee0db03	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-15 13:28:54 +08:00
shuyixiong	25db9e7b3e	[https://nvbugs/5741060 ][chore] Waive all pg operator tests (#9991 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-14 21:24:43 -08:00
Balaram Buddharaju	dfc8799352	[https://nvbugs/5669114 ][fix] Switch to MMMU benchmark for Gemma3 27B (#9966 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-14 21:23:59 -08:00
Fanrong Li	8f144d9282	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-15 12:42:25 +08:00
QI JUN	b57650f1e6	[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 (#9934 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-14 19:21:54 -08:00
xxi	f5696df285	[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858 )	2025-12-15 10:47:15 +08:00
dominicshanshan	4bf42f8fa8	[https://nvbugs/5580297 ][fix] Skip capture request error test from Ray stage (#9947 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-15 10:03:16 +08:00
Simeng Liu	f21e2b3329	[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-15 08:42:30 +08:00
Emma Qiao	e0a4b72279	[None][infra] Waive failed tests for main branch on 12/14 (#9982 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-14 22:48:34 +08:00
Mike Iovine	96d654029d	[https://nvbugs/5666816 ][fix] Unwaive llama3 eagle3 test (#9964 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-14 15:07:35 +08:00
nvxuanyuc	a5a37227d6	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-12-14 10:47:24 +08:00
Mike Iovine	383b13e0e5	[None][feat] Implement sampling on 1-model EAGLE3 (#9885 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-13 07:38:22 -08:00
Yan Chunwei	85406f9dda	[https://nvbugs/5720482 ][fix] Fix test rpc streaming (#9902 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-13 01:14:43 -08:00
shuyixiong	8cbf2d958c	[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-13 01:02:11 -08:00
Balaram Buddharaju	6a6e41f802	[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:41 -08:00
bhsueh_NV	e49c70f6df	[None][feat] Support Mistral Large3 LLM part (#9820 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-13 11:44:27 +08:00
Balaram Buddharaju	461446045e	[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 16:49:25 -08:00
tburt-nv	6147452158	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-12-13 08:35:31 +08:00
Chuang Zhu	4cc4cbe926	[https://nvbugs/5716787 ][fix] terminate nixl running when exiting (#9785 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-12 11:15:02 -05:00
Chuang Zhu	9c59c9f920	[https://nvbugs/5643787 ][fix] remove the war path for notify to itself (#9834 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-12 11:10:05 -05:00
JunyiXu-nv	2fec53dfa5	[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-12 23:32:39 +08:00
Yihan Wang	9df4dad3b6	[None][fix] Introduce inline namespace to avoid symbol collision (#9541 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 23:32:15 +08:00
Balaram Buddharaju	af315d8ef1	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:05 +08:00
Lucas Liebenwein	e767fc649a	[None][feat] AutoDeploy: prepare_metadata revisited (#9764 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-12 20:14:14 +08:00
ruodil	9b3e5e90ee	[None][test] fix a typo in model name in script (#9867 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-12 17:35:55 +08:00
chenfeiz0326	61745f034a	[https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-12 17:16:50 +08:00
kris1025	2fc94e5dd7	[None][chore] unwaive qwen3 accuracy test (#9895 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-12-12 16:30:09 +08:00
Yihan Wang	711016c799	[https://nvbugs/5736923 ][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test (#9942 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 15:15:13 +08:00
Ivy Zhang	fded6c393d	[TRTLLM-9262][test] add groupgemm ada case for rcca (#9833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-12-12 13:23:33 +08:00
dominicshanshan	093465ed29	[https://nvbugs/5599176 ][fix] Unwaive fixed test for Ray (#9861 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-12 11:24:05 +08:00
xinhe-nv	e8efeb765d	[TRTLLM-9717][fix] fix multi nodes tests cases (#9736 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-12 10:14:23 +08:00
Venky	fd1270b9ab	[TRTC-43] [feat] Add config db and docs (#9420 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-12-12 04:00:03 +08:00
Simeng Liu	24f92721f2	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-12 02:29:30 +08:00
Erin	89dabf5aa1	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 ) Signed-off-by: Liwei Ma <liweim@nvidia.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-11 09:33:25 -08:00
JadoTu	02edb19f43	[None] [feat] add eos_token_id in generation_config to sampling params (#9514 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-12 00:52:03 +08:00
xxi	488d38f88d	[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772 )	2025-12-12 00:22:13 +08:00
Yan Chunwei	04a39a4e2b	[None][chore] enable test_ipc.py (#9865 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-11 17:47:14 +08:00
Zongfei Jing	c76b428e2e	[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-12-11 16:21:32 +08:00
JunyiXu-nv	454e7e59e5	[https://nvbugs/5718004 ][fix] Add warmup for cancellation test (#9860 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-11 12:20:33 +08:00
Bo Deng	c1d53ee43d	[https://nvbugs/5582258 ][fix] unwaive (#9650 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-12-10 19:18:30 -08:00
fredricz-20070104	341cb1a12c	[None][chore] Add GB300 support since it does not support segment (#9731 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-10 18:36:55 -08:00
Patrice Castonguay	2c0293c612	[https://nvbugs/5601682 ][fix] Unwaiving disagg test (#9627 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-10 13:42:26 -05:00
cheshirekow	2f030312a8	[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367 ) Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-12-10 21:01:19 +08:00
Yukun He	072f236002	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 ) Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-10 20:41:04 +08:00
dominicshanshan	0e78a4b244	[https://nvbugs/5702791 ][fix] Unwaive fixed test (#9844 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-10 14:01:44 +08:00
QI JUN	2c46126a93	[TRTLLM-9794][ci] move some deepseek test cases to gb200 (#9841 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 19:54:51 -08:00
zhanghaotong	36c9e7cfe6	[None][chore] Add unittest for otlp tracing (#8716 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-09 18:34:08 -08:00
dhansen-nvidia	2d33ae94d5	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-12-09 18:51:31 -05:00
Patrice Castonguay	414448bb37	[https://nvbugs/5719561 ][chore] Unwaive tests for nvbug 5719561 (#9801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 18:21:50 -05:00
Patrice Castonguay	ff0ef19ee9	[https://nvbugs/5688388 ][chore] Unwaiving fixed disagg test (#9800 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 16:51:46 -05:00
Patrice Castonguay	7d7d05d8db	[None][chore] Adding flaky auto scaling test to waives (#9851 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 15:05:19 -05:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Dom Brown	3156f2e852	[https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 (#9788 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-12-09 13:37:55 +00:00
Emma Qiao	75bc386b65	[None][infra] Waive failed cases for main branch on 12/09 (#9839 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-09 19:39:29 +08:00
QI JUN	58c29957d9	[TRTLLM-9794][ci] move qwen3-next test cases to gb200 (#9827 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 01:58:25 -08:00
Stefan Niebler	d600b9f851	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-09 10:44:01 +01:00
Robin Kobus	76f49c903b	[None][fix] Additional model outputs for pipeline parallelism (#9794 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-09 10:41:22 +01:00
yufeiwu-nv	fbcf03040f	[None][test] Refactor qa/llm_perf_nim.yml test list (#9700 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-08 22:00:43 -08:00
QI JUN	252769c930	[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 (#9817 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-08 21:51:30 -08:00
Shi Xiaowei	b050804b63	[TRTLLM-6537][infra] extend multi-gpu tests related file list (#9614 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-09 12:54:53 +08:00
JunyiXu-nv	90890785eb	[https://nvbugs/5722653 ][fix] Fix config file used by disagg_client (#9783 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-08 20:34:55 -08:00
Balaram Buddharaju	bafb60c1bc	[None][chore] Fix tests failing on pre-merge 12/08 (#9819 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-08 20:08:52 -08:00
Bo Li	f2006a1f74	[https://nvbugs/5726066 ][infra] Waive timeout disaggregated/test_auto_scaling tests. (#9815 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-08 19:51:43 -08:00
JunyiXu-nv	f521f6d910	[None][fix] Fix unterminated process issue for RemoteOpenAIServer (#9490 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-09 11:15:40 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
yuanjingx87	390391ebf1	[None][infra] Correct the waived test names due to a merge conflict (#9803 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-12-09 09:48:21 +08:00
Chenghao Zhang	75f5446d67	[#9753 ][feat] AutoDeploy: Implement add rms_norm fusion (#9754 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-08 14:24:27 -08:00
Eran Geva	23cf72b0f8	[#8921 ][feat] Added symetric memory AllReduce strategy (#8919 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-08 13:12:56 -08:00
Yibin Li	faabc1a387	[TRTLLM-7967][chore] Add more tests (#9415 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-12-08 11:57:32 -08:00
Jhao-Ting Chen	0a09465089	[https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 11:16:05 -08:00
Frank	f6df9eb2a6	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
Lizhi Zhou	52f78e4000	[http://nvbugs/5649010 ][fix] fix test_auto_scaling.py::test_worker_restart timeout (#9775 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-08 03:26:01 -08:00
fredricz-20070104	96d9b67d65	[https://nvbugs/5527655 ][test] Add test case for RCCA 5527655 (#9511 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 01:27:13 -08:00
fredricz-20070104	ededeecb0f	[None][test] Add Kimi k2 WIDEEP perf and accuracy cases (#9686 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-08 01:25:07 -08:00
xinhe-nv	3f55c07223	[None][chore] Remove closed bugs (#9770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-07 22:51:55 -08:00
Li Min	a422d70be6	[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-08 13:28:11 +08:00
Fanrong Li	2f526583fb	[None][chore] Move the rocketkv e2e test to post-merge (#9768 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-08 13:22:16 +08:00
Emma Qiao	137713a869	[None][infra] Waive failed cases for main on 12/08 (#9773 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 20:18:29 -08:00
ruodil	d232709568	[https://nvbugs/5666804 ][test] only adding sampler config for limited models (#9512 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-07 19:40:29 -08:00
fredricz-20070104	9bfb6179ec	[https://nvbugs/5422621 ][test] Add GB 200 WIDEEP test case for RCCA 5422621 (#9506 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-08 10:41:40 +08:00
xxi	8e27ce7084	[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (#9645 )	2025-12-08 10:19:40 +08:00
Zheng Duan	4da0e1473c	[None][test] add ntp tolerance in time metrics verification (#9741 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-12-08 09:51:10 +08:00
chenfeiz0326	383178c00a	[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (#8800 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-08 09:00:44 +08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
Emma Qiao	7c6c493993	[None][infra] Waive failed cases for main branch on 12/07 (#9769 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-07 06:26:47 -08:00
JunyiXu-nv	b210f22c7e	[https://nvbugs/5703953 ][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-06 20:13:48 -08:00
Yan Chunwei	e4c707845f	[None][fix] enable hmac in RPC (#9745 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-07 08:24:46 +08:00
Jonas Li	2645a78f34	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 ) Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-06 02:24:51 -08:00
Enwei Zhu	7cd5a67e25	[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-05 22:08:52 -08:00
Mike Iovine	31ab367576	[None][chore] Waive flakey disagg tests (#9749 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-05 13:07:05 -08:00
jthomson04	299601aebf	[https://nvbugs/5670672 ][fix] Fix flaky KV connector tests (#9676 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-12-05 10:04:54 -08:00
Robin Kobus	eb0b426e5d	[None][refactor] Improve request processing function in sampler (#9671 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:41:49 +01:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00
yufeiwu-nv	68253d9d29	[https://nvbugs/5518713 ][test] Refactor core test lists by merging with llm_perf_cluster.yml (#9714 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-05 01:15:37 -08:00
Kaiyu Xie	e06c582648	[None] [tests] Unwaive EPLB tests (#9625 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-05 00:13:24 -08:00
gramnarayan	74df9b180b	[#9602 ][feat] AutoDeploy: Support TRTLLM Sampler (#9641 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 19:24:11 -08:00
Lizhi Zhou	dc766fc126	[https://nvbugs/5633340 ][fix] start disagg workers and servers on free ports (#9694 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:51:29 +08:00
Lizhi Zhou	0d0a16fff4	[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:44:16 +08:00
xinhe-nv	530af1a98e	[None][chore] Add failed cases into waives.txt (#9662 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-04 22:33:22 +08:00
Anthony Chang	60cdca3740	[None][fix] Recover TRTLLM MoE Perf for DEP (#9562 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-12-04 22:10:25 +08:00
Jin Li	e5d4305c04	[https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … (#9617 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 18:17:24 +08:00
ruodil	8a392af28f	[None][test] rename wide ep and disagg metric name in perf test (#9704 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-04 18:16:06 +08:00
Yan Chunwei	05058f5e2a	[None][ci] unwaive tests (#9651 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-04 15:06:07 +08:00
tcherckez-nvidia	f9aa86dbdd	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com>	2025-12-04 08:03:33 +02:00
JunyiXu-nv	6d2daec5d0	[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-04 13:49:40 +08:00

... 4 5 6 7 8 ...

2719 Commits