TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

Author	SHA1	Message	Date
Lucas Liebenwein	15b43e8a14	[https://nvbugs/5777041 ][fix] fix AutoDeploy ep sharding test (#10460 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 21:53:56 -05:00
Dom Brown	94c7b69048	[https://nvbugs/5630196 ] [fix] Prevent flaky failures in C++ test_e2e.py by using local cached datasets for benchmarking (#10638 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2026-01-14 21:39:55 -05:00
Wanli Jiang	73d1840c12	[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-15 10:07:02 +08:00
dominicshanshan	0f2d61b8c6	[https://nvbugs/5766952 ][fix] Fix AIPerf issue. (#10666 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-15 09:54:34 +08:00
bhsueh_NV	5f9fc50233	[https://nvbugs/5800725 ][infra] Update waives.txt (#10625 )	2026-01-15 09:08:07 +08:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
Tzu-Ling Kan	c99faaed06	[#9760 ][fix] Use RequestError for validation errors to prevent engine shutdown (#9761 ) Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>	2026-01-14 10:22:36 -05:00
Emma Qiao	01083b56bf	[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: xxi <xxi@nvidia.com> Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: xxi <xxi@nvidia.com> Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-14 21:54:04 +08:00
Emma Qiao	35c24424f6	[None][infra] Waive failed cases in post-merge on 01/14 (#10668 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-14 21:39:32 +08:00
HuiGao-NV	b10704428d	[https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data (#10569 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-14 07:53:01 -05:00
Bo Li	582dec5bb5	[https://nvbugs/5774869 ][infra] Use 2 GPUs to test skip softmax attention on H100. (#10420 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-14 07:03:01 -05:00
shuyixiong	babd5ecacc	[https://nvbugs/5760740 ][fix] Enable ray tests (#10272 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2026-01-14 19:25:46 +08:00
xinhe-nv	272688c663	[None][fix] fix L0 issues (#10670 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-14 18:09:40 +08:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
mpikulski	052c36ddd2	[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-14 10:31:03 +01:00
Bo Li	487287a412	[None][chore] Update test name MNNVL->NVLinkTwoSided. (#9672 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-14 04:29:57 -05:00
QI JUN	c4da4fd462	[https://nvbugs/5637220 ][ci] unwaive TestQwen3_235B_A22B::test_nvfp4[latency_moe_trtllm_attention_dp] (#9870 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2026-01-14 15:41:14 +08:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
xxi	f841b43cde	[None][chore] waive the CI failure (#10655 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 13:59:15 +08:00
JennyLiu	92ae490410	[None][test] Spark - Change testlist name and perf yml format (#10626 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-13 23:07:11 -05:00
xinhe-nv	07d9390e9b	[None][test] add test into qa test list (#10627 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-13 22:43:00 -05:00
xinhe-nv	7305c61fc9	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10589 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-13 22:00:20 -05:00
Leslie Fang	bc119f5644	[None][chore] Add test configurable moe module (#10575 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 07:25:57 +08:00
Balaram Buddharaju	ccdfa43a6e	[https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP (#10533 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-13 15:48:42 -05:00
Frida Hou	bf16fbd86c	[#9283 ][feat] AutoDeploy: separate rms pattern detection from fusion (#9969 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-13 14:57:27 -05:00
dongfengy	6ee8dbfe0b	[https://nvbugs/5772396 ][fix] WAR: Disable TinyGEMM PDL due to accuracy issues (#10619 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-13 12:40:11 -05:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Guoming Zhang	c1b0b7350f	[None][test] Unwaive qwen3 next test case. (#9877 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 20:42:31 +08:00
Tailing Yuan	38296a472b	[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-13 19:17:03 +08:00
Erin	55580f8ec1	[NVBUG-5670458][chore] Unwaive lp tests (#10524 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Erin <14718778+hchings@users.noreply.github.com>	2026-01-13 04:31:27 -05:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
JunyiXu-nv	e291a834db	[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} (#9937 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2026-01-13 03:57:14 -05:00
JennyLiu	2967d299fb	[TRTLLM-10271][test] Add Spark QA functional and performance cases (#10564 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-13 13:20:15 +08:00
fredricz-20070104	bbe535fddf	[None][chore] Fix disagg assert (#10596 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2026-01-12 21:39:57 -05:00
Iman Tabrizian	48b09e5a25	[https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg (#10111 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-12 18:23:26 -05:00
Anish Shanbhag	dacc881993	[https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-12 10:55:07 -08:00
Suyog Gupta	a1385243e1	[#10580 ][fix] re-enable NemotronH MOE MMLU test (#10594 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-01-12 09:26:07 -08:00
Emma Qiao	9f044b9dd9	[None][infra] Waive failed tests for main 01/12 (#10604 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-12 10:24:54 -05:00
mpikulski	bf7998f1b8	[TRTLLM-9522][test] cover LLM API `multi_modal_embeddings` (#9963 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-12 11:38:22 +01:00
Wanli Jiang	11da7e3605	[None][fix] Solve pillow version conflict (#10537 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-12 04:05:54 -05:00
Zhenhuan Chen	3bd319dc8e	[https://nvbugs/5794796 ][chore] waive test blocking premerge (#10593 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-01-12 15:39:07 +08:00
yufeiwu-nv	8e806abac3	[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml (#10572 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-12 15:34:55 +08:00
yingguo-trt	c5914f9085	[None][chore] update deepseekv3.2 test parameter (#10595 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-12 01:43:22 -05:00
chenfeiz0326	54459377d2	[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel (#10489 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-12 14:23:23 +08:00
Jie Li	5e0dbba0c9	[None][chore]: update waive list (#10577 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-11 22:18:04 -05:00
Eran Geva	c5d5af9e7f	[#8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test (#10585 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-11 16:31:24 -05:00
Ivy Zhang	7f018c89e9	[None][test] update core test list (#10538 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2026-01-11 14:08:20 -05:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
HuiGao-NV	3c65ec3c55	[None][chore] waive test case (#10581 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-10 18:53:36 -05:00
fredricz-20070104	f6045fac09	[None][chore] Fix Gitlab CI termination issues (#10576 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2026-01-10 07:51:18 -05:00
William Zhang	ff7eb93f31	[https://nvbugs/5669097 ][tests] Add MMMU test for mistral small (#10530 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-09 16:09:28 -08:00
Chenghao Zhang	38f249b479	[https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test (#10521 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-09 13:30:24 -08:00
yingguo-trt	d80f01d205	[None][feat] Add support for DeepSeek v3.2 tests (#10561 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-09 10:20:29 -05:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Iman Tabrizian	ced88424ef	[https://nvbugs/5756008 ][fix] unwaive test (#10523 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-09 09:40:07 -05:00
Jie Li	627d306df9	[None][chore] remove some model support; add device constraint (#10563 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-09 09:36:23 -05:00
ruodil	2b72d33fdc	[TRTLLM-9932][test] add kimi_k2 single node perf test (#10436 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2026-01-09 05:36:50 -05:00
bhsueh_NV	4a09acd012	[https://nvbugs/5785206 ][infra] unwaive the accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B (#10560 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-09 03:13:29 -05:00
JadoTu	4c498bfe58	[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873 ) Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>	2026-01-09 14:50:16 +08:00
Jie Li	6fcd4e7099	[None][chore] Add failed cases into waives.txt (#10541 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-09 01:03:47 -05:00
ruodil	d707286ca8	[None][test] restrict max_num_tokens in disagg mtp config (#10442 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2026-01-08 21:53:24 -05:00
Balaram Buddharaju	56e779d09f	[None][chore] Waive tests blocking premerge 01/08 (#10555 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-08 20:22:28 -05:00
Mike Iovine	4092a87b6f	[https://nvbugs/5740075 ][fix] Fix sm120 speculation (#10049 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-08 19:55:43 -05:00
William Zhang	c0ae6bbdbe	[None][feat] EPD for Qwen3 VL (#10470 ) * Why? We would like to support EPD disaggregated serving for Qwen3 VL. * What? This commit adds such support, and extends existing unit tests for correctness checks. Some minor (protected) interface changes had to be made to the weight mapper as a side-effect. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-08 06:45:54 -05:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00
Emma Qiao	43839c7d9b	[TRTLLM-9642][infra] Increase pytest verbosity for failed tests (#9657 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2026-01-08 02:33:48 -05:00
HuiGao-NV	22c81cb5fa	[None][chore] Enable seg fault cases since one race condition is fixed (#10398 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-08 02:15:30 -05:00
Barry Kang	f57aab5255	[https://nvbugs/5775402 ][fix] Fix concurrency list in Wide-EP perf tests (#10529 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2026-01-08 01:58:55 -05:00
Lucas Liebenwein	30f8455d29	[https://nvbugs/5747878 ][fix] unwaive llama4 scout tests (#10468 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-07 23:33:45 -05:00
yingguo-trt	f8b2a8fd30	[None][chore] Support multiple job submission at the same time (#10492 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Co-authored-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2026-01-07 21:51:36 -05:00
Yuxian Qiu	b85c447ceb	[https://nvbugs/5784543 ][fix] Setup dist before using autotuner. (#10491 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-08 10:32:50 +08:00
xxi	81f878c279	[https://nvbugs/5707392 ][fix] unwaive test_fused_moe_fp8_blockwise_wide_ep[NotEnabled] (#10428 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-08 09:17:59 +08:00
Lucas Liebenwein	d736c7f290	[https://nvbugs/5761665 ][fix] AutoDeploy: handle bugs for 25.12 dlfw upgrade (#10511 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-07 20:16:53 -05:00
yufeiwu-nv	b130d58c88	[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml (#10487 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-07 17:18:43 +08:00
xinhe-nv	872210468b	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10474 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-07 03:23:43 -05:00
yingguo-trt	cbf8357e5f	[https://nvbugs/5726086 ][fix] update kimi-k2-1k1k dataset (#10473 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-07 01:24:08 -05:00
xinhe-nv	be5579633e	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10457 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-07 00:57:03 -05:00
Fanrong Li	a34aa63685	[https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 (#10449 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-07 12:29:51 +08:00
xinhe-nv	1fbadd2dde	[None][chore] Add failed cases into waives.txt (#10365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <lijie@nvidia.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-01-06 22:08:06 -05:00
Ivy Zhang	4a1b2e23b3	[https://nvbugs/5698434 ][test] add qwen3-4b accuracy test case (#10382 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2026-01-06 21:56:34 -05:00
Lucas Liebenwein	6095c80e56	[https://nvbugs/5721907 ][fix] AutoDeploy: improve numerical stability of flashinfer attention test (#10467 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-06 21:11:06 -05:00
Zongfei Jing	bb2f883296	[None] [feat] Add test script and raster M for gather fc1 kernel (#10429 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-07 09:31:49 +08:00
Lucas Liebenwein	bb6a3973aa	[https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes (#10466 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-06 19:55:49 -05:00
Mike Iovine	77be1b7572	[https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test (#10387 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-06 11:46:34 -05:00
Enwei Zhu	037753f65b	[https://nvbugs/5748600 ][ci] Unwaive disagg guided decoding test (#10409 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-06 11:38:12 -05:00
JunyiXu-nv	7d62773c6c	[https://nvbugs/5760726 ][fix] Use random port in container port section (#10432 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2026-01-06 23:25:46 +08:00
xinhe-nv	704f58dfbe	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10427 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-06 04:47:54 -05:00
Emma Qiao	6507087c3f	[None][infra] Waive failed cases on 1/6 (#10440 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-06 16:54:54 +08:00
Bo Li	df0b976b99	[https://nvbugs/5785206 ][infra] Waive TestQwen3_30B_A3B::test_fp8[latency-torch_compile=False]. (#10441 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-06 03:32:19 -05:00
William Zhang	ab58d7cac1	[https://nvbugs/5772361 ][ci] Unwaive tests that have been fixed (#10424 ) These tests were all failing due to the same issue, and were fixed in #10394. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-05 23:49:54 -08:00
Ivy Zhang	1e828587e5	[TRTLLM-9896][test] add vswa test cases coverage (#10146 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2026-01-06 02:02:29 -05:00
Yiqing Yan	5108a69fc0	[TRTLLM-9622][infra] Enable DGX_B300 multi-gpu testing in pre-merge pipeline (#9699 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-06 14:39:55 +08:00
xinhe-nv	998527724c	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10367 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-06 01:09:21 -05:00
Ivy Zhang	22a1d31a27	[None][test] update test case constraint (#10381 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2026-01-06 12:28:59 +08:00
xinhe-nv	1b1058279c	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10384 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-05 23:02:27 -05:00
kris1025	3e98265682	[None][chore] unwaive qwen3 30b test (#10115 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2026-01-06 11:17:08 +08:00
alel	6b8ae6fa81	[None][feat] CuteDSL MOE FC1 Enhancement (#10088 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2026-01-06 09:30:43 +08:00
chenfeiz0326	8a04c05079	[None][fix] Only Use Throughput Metrics to Check Regression (#10404 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-06 09:21:15 +08:00
Chuang Zhu	536a8f6a9c	[TRTLLM-9527][feat] Add transferAgent binding (step 1) (#10113 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-06 08:40:38 +08:00
Simeng Liu	3b56548fcf	[https://nvbugs/5777044 ][chore] Remove solved bugs from waives.txt (#10422 ) Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>	2026-01-05 16:56:58 -05:00
Mike Iovine	91ff46d418	[https://nvbugs/5745152 ][fix] Unwaive gpt oss spec decode test (#10370 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 16:06:58 -05:00
Mike Iovine	7a2dab8e85	[https://nvbugs/5695984 ][fix] Unwaive llama3 eagle test (#10092 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 16:03:35 -05:00
Yan Chunwei	6b71b03947	[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution (#10400 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2026-01-05 13:58:03 -05:00
Mike Iovine	db2614ef10	[https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case (#10385 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 17:20:14 +01:00
Gal Hubara-Agam	e98c27ee4f	[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-05 18:17:27 +02:00
Anthony Chang	225d3a9001	[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2026-01-05 17:16:12 +01:00
Balaram Buddharaju	a792c23dcf	[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-05 20:08:03 +08:00
xinhe-nv	b1733d56f6	[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests (#10357 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-05 05:15:52 -05:00
Fanrong Li	4931c5eb3a	[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 16:43:42 +08:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
HuiGao-NV	2f768b76f8	[https://nvbugs/5715568 ][fix] Force release torch memory when LLM is destroyed (#10314 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-05 15:30:18 +08:00
Emma Qiao	c63fad7d96	[None][infra] Waive failed cases again on 1/5 (#10403 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-05 02:12:16 -05:00
Yihan Wang	e7a4486294	[https://nvbugs/5752521 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py (#10227 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-05 14:37:05 +08:00
Yukun He	0937df2c68	[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 13:44:09 +08:00
Emma Qiao	5a8bfcbb50	[None][infra]Waive failed cases in post-merge on 1/5 (#10399 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-05 12:30:10 +08:00
Tailing Yuan	a7fe043b13	[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-05 11:23:04 +08:00
Yuxian Qiu	5773a4d775	[https://nvbugs/5701425 ][chore] Unwaive tests. (#10269 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-05 09:54:26 +08:00
Fanrong Li	b5a1e10bc0	[https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata (#10393 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 09:43:44 +08:00
Wanli Jiang	da0830670a	[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus (#10234 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-05 09:41:49 +08:00
Lizhi Zhou	82c1ba84a7	[https://nvbugs/5649010 ][fix] use 0 port as arbitrary port when disagg service discovery is enabled (#10383 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-05 09:40:40 +08:00
Eran Geva	e2f5455533	[#8391 ][chore] added deepseek_r1_distill_qwen_32b AutoDeploy perf test to L0 (#10377 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-04 20:35:52 +02:00
chenfeiz0326	a65b0d4efa	[None][fix] Decrease Pre Merge Perf Tests (#10390 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 12:21:34 -05:00
Yanchao Lu	c4f27fa4c0	[None][ci] Some tweaks for the CI pipeline (#10359 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 11:10:47 -05:00
dongfengy	afc533193d	[None][feat] Support nvfp4 for gptoss (#8956 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-04 08:57:44 -05:00
Jaedeok Kim	a4dcc6a711	[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager (#10330 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-04 06:07:30 -05:00
Yuxian Qiu	6ba04eba06	[https://nvbugs/5748683 ][fix] Use get_free_port_in_ci to avoid port conflict. (#10392 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-04 19:04:58 +08:00
Yanchao Lu	c0b3c2b919	[None][ci] Remove an invalid test waive Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-03 23:34:13 +08:00
Emma Qiao	865992b86b	[None][infra] Waive failed cases on 1/3 (#10391 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-03 05:54:09 -05:00
Izzy Putterman	bdf6953ddc	[None][feat] Eagle: MLA Based Eagle (#9677 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-02 13:45:07 -05:00
Gal Hubara-Agam	f3dd6da080	[#10056 ][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test (#10308 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-02 11:20:19 +02:00
chenfeiz0326	5e0e48144f	[None][fix] Minor updates on Perf Test System (#10375 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-02 17:17:42 +08:00
fredricz-20070104	f631b25c85	[None][test] Unified slurm extra args management and session collection logic (#10332 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-01 21:10:51 -05:00
Balaram Buddharaju	4a1b742aa0	[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 13:42:53 -05:00
Balaram Buddharaju	9f5b750a93	[None][chore] Waive tests blocking pre-merge 12/31 (#10373 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 03:00:24 -05:00
Balaram Buddharaju	0b75340223	[https://nvbugs/5744427 ][fix] Make Gemma3 multimodal test fp8 (#10368 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 01:11:34 -05:00
Yuxian Qiu	ff836d4f41	[https://nvbugs/5740359 ][chore] Unwaive tests. (#10260 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-01 09:53:34 +08:00
Lucas Liebenwein	1bbe71b3ed	[#10244 ][feat] AutoDeploy: separate prefill/decode in flashinfer (#10252 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-31 17:01:24 -05:00
Simeng Liu	84d107b2f0	[https://nvbugs/5717993 ][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. (#10060 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-31 09:22:54 -08:00
xinhe-nv	0d2e2718ce	[None][chore] Add failed cases into waives.txt (#10354 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 09:30:22 -05:00
chenfeiz0326	a23c6f1092	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-31 21:44:59 +08:00
tcherckez-nvidia	464847c6be	[#9717 ][chore] Standardize MoE weights interface (#10295 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-31 07:37:18 -05:00
Jin Li	ef1d4a40b5	[https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… (#10212 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 06:21:36 -05:00
Emma Qiao	d944430f96	[None][infra] Waive failed cases on 12/31 (#10353 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-31 17:39:49 +08:00
Necofish	73870ae4ad	[None][feat] support Qwen3-VL dense model in pytorch backend (#9060 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-31 17:54:26 +09:00
xinhe-nv	827d12caaf	[https://nvbugs/5558516 ][test] add disaggregated stress test (#9354 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 16:47:36 +08:00
Yuxian Qiu	910a633066	[https://nvbugs/5774869 ][chore] waive tests. (#10356 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 03:00:52 -05:00
xinhe-nv	1e9c153b4c	[None][fix] disable thread leak check for kimi (#10337 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 01:31:37 -05:00
xinhe-nv	6c1abf2d45	[None][chore] Add failed cases into waives.txt (#10344 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 00:11:54 -05:00
Jin Li	34c2fd50a9	[https://nvbugs/5707359 ][fix] Unwaive OOM case that should be fixed by #9446 (#10334 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 10:41:39 +08:00
Yuxian Qiu	ec8a388c25	[https://nvbugs/5769890 ][fix] Import get_free_port. (#10341 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 09:47:27 +08:00
Eran Geva	74832a1895	[https://nvbugs/5766986 ][fix] fixed the shard_all_unprocessed default value to align with the default.yml (#10271 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-30 08:54:13 -05:00
Bo Li	1f0365da36	[None][infra] Add LongBenchV1 to trtllm-eval. (#10265 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-30 21:39:34 +08:00
Emma Qiao	6732c76414	[None][infra] Waive failed cases for main on 12/30 (#10338 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 05:17:43 -05:00
Emma Qiao	fb05cd769a	[None][infra] Enable single-gpu CI on spark (#9304 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-30 17:22:14 +08:00
Emma Qiao	cce7247815	[https://nvbugs/5594703 ][infra] Unwaive the failed case to test (#10275 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 16:38:54 +08:00
xinhe-nv	6accdbc6a6	[None][chore] Add failed cases into waives.txt (#10302 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 03:11:52 -05:00
ruodil	0f4ed90560	[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-30 02:39:50 -05:00
xinhe-nv	3e0344a53d	[None][chore] Add failed cases into waives.txt (#10301 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 14:04:28 +08:00
xinhe-nv	48fee8d0f6	[None][chore] Add failed cases into waives.txt (#10321 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 00:11:49 -05:00
Emma Qiao	f396ad83b0	[None][infra] Remove duplicates in waives.txt (#10333 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-29 22:32:52 -05:00
Balaram Buddharaju	4944192eae	[None][chore] Waive tests failing in pre-merge 12/28 (#10311 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-29 20:53:49 -05:00
Neta Zmora	966231d29c	[#9626 ][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels (#10304 ) Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused. Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128, so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-29 23:18:15 +02:00
Yueh-Ting (eop) Chen	9cee32ab39	[https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager (#10183 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-12-29 14:29:14 +08:00
Yanchao Lu	2f8d6d25a8	[None][ci] Waive an intermittent test hang case (#10324 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-29 13:04:31 +08:00
Yanchao Lu	270be801aa	[None][ci] Move remaining DGX-B200 tests to LBD (#9876 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-28 13:55:39 +08:00
JunyiXu-nv	55bc6a5ff8	[https://nvbugs/5753250 ][fix] Fix undefined local variable in responses utils (#10154 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-28 06:59:32 +08:00
shivghai	ee07a7c55e	[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 (#9961 ) Signed-off-by: Shiv Ghai <8965168+shivghai@users.noreply.github.com>	2025-12-27 11:50:59 -08:00
Guoming Zhang	93ac0bc1dc	[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… (#10229 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-27 22:48:10 +08:00
Jin Li	c04563657e	[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-27 00:07:20 +08:00
chenfeiz0326	d70aeddc7f	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-26 22:50:53 +08:00
Pengyun Lin	684b37df02	[https://nvbugs/5747938 ][fix] Use local tokenizer (#10230 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 22:08:10 +08:00
Pengyun Lin	c5b0f9e436	[https://nvbugs/5633700 ][fix] Cache tiktoken vocab for gpt-oss (#10219 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 18:39:03 +08:00
dongfengy	bfc591994c	[https://nvbugs/5745152 ][fix] Fix some GPTOSS test setups (#10085 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-12-26 17:52:40 +08:00
Neta Zmora	f3f02315df	[None][chore]: small refactoring to auto-deploy MoE operator (#10300 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-25 12:27:11 -05:00
bhsueh_NV	db3430f589	[None][feat] Support VLM part for Mistral Large 3 (#10188 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-25 11:20:58 -05:00
Ziyi Xiong	d8b5aeb061	[https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens (#10160 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-25 09:13:51 -05:00
ZhichenJiang	46e4af5688	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 09:04:20 -05:00
Lizhi Zhou	fe12faef81	[https://nvbugs/5752516 ][chore] unwaive test; fix port conflicts in CI (#10152 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-25 08:16:09 -05:00
Emma Qiao	0ecdb69b93	[None][infra] Waive failed tests for main on 12/25 (#10298 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-25 05:22:39 -05:00
Jie Li	83e02ee335	[None][chore] Remove NIM TRT-Backend Test Lists (#10232 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2025-12-25 04:01:51 -05:00
Enwei Zhu	182b3eb633	[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout (#10293 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 02:35:18 -05:00
Gabriel Wu	1d01214ff0	[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>	2025-12-25 14:52:52 +08:00
xinhe-nv	4ae6f6a46c	[None][chore] Add failed cases into waives.txt (#10249 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-25 01:26:21 -05:00
Venky	c059e6caa1	[TRTC-121] [feat] Add recipe selector UI to complement the recipe database (#10125 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-12-24 23:56:54 -05:00
gramnarayan	a9eb5afc9f	[#9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding (#9869 ) Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-24 23:30:42 -05:00
Emma Qiao	16fd781e42	[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge (#9897 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-24 21:45:33 -05:00
Neta Zmora	c4b36d31ff	[#10137 ][feat] AutoDeploy FP8 MoE refactor (#10138 ) The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time. The Cutlass MoE kernel is used thru a trtllm torch operator. Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-24 18:58:10 +02:00
Stanley Sun	ddac4d7379	[None][test] Add disag-serving auto scaling qa test (#10262 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-12-24 08:43:47 -05:00
shuyixiong	f4f0fe85e9	[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-24 15:27:01 +08:00
xinhe-nv	534700ecd9	[None][chore] Add failed cases into waives.txt (#10240 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-24 02:21:50 -05:00
Fanrong Li	156f6453dc	[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 (#10226 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-24 14:39:12 +08:00
Emma Qiao	7b84e48e0f	[None][infra] Waive failed cases om 12/24 (#10257 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 22:49:57 -05:00
xinhe-nv	fc1f77eafc	[None][chore] Add failed cases into waives.txt (#10204 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2025-12-24 10:37:23 +08:00
Balaram Buddharaju	8c1cfc872b	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 18:14:30 -08:00
Jhao-Ting Chen	92d90fa29a	[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS (#10018 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski	0027a01ad5	[https://nvbugs/5680312 ][fix] Updated test waiving (#9630 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 09:38:12 -08:00
Emma Qiao	984c20e0b2	[None][infra] Waive failed cases on 12/23 (#10236 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 08:48:54 -05:00
dongfengy	e284d0bf80	[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests (#10209 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 07:43:13 -05:00
Yukun He	522f1d2bc3	[https://nvbugs/5764627 ][chore] waive the time-out test (#10222 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-23 16:36:06 +08:00
Balaram Buddharaju	f2e00a75de	[None][chore] Remove helix test from rtx test list (#10224 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 03:07:37 -05:00

... 2 3 4 5 6 ...

2719 Commits