Lucas Liebenwein
15b43e8a14
[ https://nvbugs/5777041 ][fix] fix AutoDeploy ep sharding test ( #10460 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-14 21:53:56 -05:00
Dom Brown
94c7b69048
[ https://nvbugs/5630196 ] [fix] Prevent flaky failures in C++ test_e2e.py by using local cached datasets for benchmarking ( #10638 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2026-01-14 21:39:55 -05:00
Wanli Jiang
73d1840c12
[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model ( #10482 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-15 10:07:02 +08:00
dominicshanshan
0f2d61b8c6
[ https://nvbugs/5766952 ][fix] Fix AIPerf issue. ( #10666 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-15 09:54:34 +08:00
bhsueh_NV
5f9fc50233
[ https://nvbugs/5800725 ][infra] Update waives.txt ( #10625 )
2026-01-15 09:08:07 +08:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel ( #9905 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Tzu-Ling Kan
c99faaed06
[ #9760 ][fix] Use RequestError for validation errors to prevent engine shutdown ( #9761 )
...
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
2026-01-14 10:22:36 -05:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 ( #9818 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
Emma Qiao
35c24424f6
[None][infra] Waive failed cases in post-merge on 01/14 ( #10668 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-14 21:39:32 +08:00
HuiGao-NV
b10704428d
[ https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data ( #10569 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-14 07:53:01 -05:00
Bo Li
582dec5bb5
[ https://nvbugs/5774869 ][infra] Use 2 GPUs to test skip softmax attention on H100. ( #10420 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-14 07:03:01 -05:00
shuyixiong
babd5ecacc
[ https://nvbugs/5760740 ][fix] Enable ray tests ( #10272 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2026-01-14 19:25:46 +08:00
xinhe-nv
272688c663
[None][fix] fix L0 issues ( #10670 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-14 18:09:40 +08:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support ( #10532 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API ( #9715 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
Bo Li
487287a412
[None][chore] Update test name MNNVL->NVLinkTwoSided. ( #9672 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-14 04:29:57 -05:00
QI JUN
c4da4fd462
[ https://nvbugs/5637220 ][ci] unwaive TestQwen3_235B_A22B::test_nvfp4[latency_moe_trtllm_attention_dp] ( #9870 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2026-01-14 15:41:14 +08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. ( #10380 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
xxi
f841b43cde
[None][chore] waive the CI failure ( #10655 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 13:59:15 +08:00
JennyLiu
92ae490410
[None][test] Spark - Change testlist name and perf yml format ( #10626 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-13 23:07:11 -05:00
xinhe-nv
07d9390e9b
[None][test] add test into qa test list ( #10627 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-13 22:43:00 -05:00
xinhe-nv
7305c61fc9
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10589 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-13 22:00:20 -05:00
Leslie Fang
bc119f5644
[None][chore] Add test configurable moe module ( #10575 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-14 07:25:57 +08:00
Balaram Buddharaju
ccdfa43a6e
[ https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP ( #10533 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-13 15:48:42 -05:00
Frida Hou
bf16fbd86c
[ #9283 ][feat] AutoDeploy: separate rms pattern detection from fusion ( #9969 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-13 14:57:27 -05:00
dongfengy
6ee8dbfe0b
[ https://nvbugs/5772396 ][fix] WAR: Disable TinyGEMM PDL due to accuracy issues ( #10619 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Guoming Zhang
c1b0b7350f
[None][test] Unwaive qwen3 next test case. ( #9877 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 20:42:31 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading ( #10562 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Erin
55580f8ec1
[NVBUG-5670458][chore] Unwaive lp tests ( #10524 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin <14718778+hchings@users.noreply.github.com>
2026-01-13 04:31:27 -05:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. ( #10347 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
JunyiXu-nv
e291a834db
[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} ( #9937 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-13 03:57:14 -05:00
JennyLiu
2967d299fb
[TRTLLM-10271][test] Add Spark QA functional and performance cases ( #10564 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-13 13:20:15 +08:00
fredricz-20070104
bbe535fddf
[None][chore] Fix disagg assert ( #10596 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-12 21:39:57 -05:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Anish Shanbhag
dacc881993
[ https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests ( #10192 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-12 10:55:07 -08:00
Suyog Gupta
a1385243e1
[ #10580 ][fix] re-enable NemotronH MOE MMLU test ( #10594 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-01-12 09:26:07 -08:00
Emma Qiao
9f044b9dd9
[None][infra] Waive failed tests for main 01/12 ( #10604 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-12 10:24:54 -05:00
mpikulski
bf7998f1b8
[TRTLLM-9522][test] cover LLM API multi_modal_embeddings ( #9963 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-12 11:38:22 +01:00
Wanli Jiang
11da7e3605
[None][fix] Solve pillow version conflict ( #10537 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-12 04:05:54 -05:00
Zhenhuan Chen
3bd319dc8e
[ https://nvbugs/5794796 ][chore] waive test blocking premerge ( #10593 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-12 15:39:07 +08:00
yufeiwu-nv
8e806abac3
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml ( #10572 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-12 15:34:55 +08:00
yingguo-trt
c5914f9085
[None][chore] update deepseekv3.2 test parameter ( #10595 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-12 01:43:22 -05:00
chenfeiz0326
54459377d2
[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel ( #10489 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-12 14:23:23 +08:00
Jie Li
5e0dbba0c9
[None][chore]: update waive list ( #10577 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-11 22:18:04 -05:00
Eran Geva
c5d5af9e7f
[ #8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test ( #10585 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-11 16:31:24 -05:00
Ivy Zhang
7f018c89e9
[None][test] update core test list ( #10538 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-11 14:08:20 -05:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
HuiGao-NV
3c65ec3c55
[None][chore] waive test case ( #10581 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-10 18:53:36 -05:00
fredricz-20070104
f6045fac09
[None][chore] Fix Gitlab CI termination issues ( #10576 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2026-01-10 07:51:18 -05:00
William Zhang
ff7eb93f31
[ https://nvbugs/5669097 ][tests] Add MMMU test for mistral small ( #10530 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-09 16:09:28 -08:00
Chenghao Zhang
38f249b479
[ https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test ( #10521 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-09 13:30:24 -08:00
yingguo-trt
d80f01d205
[None][feat] Add support for DeepSeek v3.2 tests ( #10561 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-09 10:20:29 -05:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test ( #10435 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Iman Tabrizian
ced88424ef
[ https://nvbugs/5756008 ][fix] unwaive test ( #10523 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-09 09:40:07 -05:00
Jie Li
627d306df9
[None][chore] remove some model support; add device constraint ( #10563 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-09 09:36:23 -05:00
ruodil
2b72d33fdc
[TRTLLM-9932][test] add kimi_k2 single node perf test ( #10436 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-09 05:36:50 -05:00
bhsueh_NV
4a09acd012
[ https://nvbugs/5785206 ][infra] unwaive the accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B ( #10560 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-09 03:13:29 -05:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case ( #9873 )
...
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Jie Li
6fcd4e7099
[None][chore] Add failed cases into waives.txt ( #10541 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-09 01:03:47 -05:00
ruodil
d707286ca8
[None][test] restrict max_num_tokens in disagg mtp config ( #10442 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-08 21:53:24 -05:00
Balaram Buddharaju
56e779d09f
[None][chore] Waive tests blocking premerge 01/08 ( #10555 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-08 20:22:28 -05:00
Mike Iovine
4092a87b6f
[ https://nvbugs/5740075 ][fix] Fix sm120 speculation ( #10049 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL ( #10470 )
...
* Why?
We would like to support EPD disaggregated serving for Qwen3 VL.
* What?
This commit adds such support, and extends existing unit tests for
correctness checks.
Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine ( #10405 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Emma Qiao
43839c7d9b
[TRTLLM-9642][infra] Increase pytest verbosity for failed tests ( #9657 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2026-01-08 02:33:48 -05:00
HuiGao-NV
22c81cb5fa
[None][chore] Enable seg fault cases since one race condition is fixed ( #10398 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-08 02:15:30 -05:00
Barry Kang
f57aab5255
[ https://nvbugs/5775402 ][fix] Fix concurrency list in Wide-EP perf tests ( #10529 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2026-01-08 01:58:55 -05:00
Lucas Liebenwein
30f8455d29
[ https://nvbugs/5747878 ][fix] unwaive llama4 scout tests ( #10468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 23:33:45 -05:00
yingguo-trt
f8b2a8fd30
[None][chore] Support multiple job submission at the same time ( #10492 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Co-authored-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-07 21:51:36 -05:00
Yuxian Qiu
b85c447ceb
[ https://nvbugs/5784543 ][fix] Setup dist before using autotuner. ( #10491 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 10:32:50 +08:00
xxi
81f878c279
[ https://nvbugs/5707392 ][fix] unwaive test_fused_moe_fp8_blockwise_wide_ep[NotEnabled] ( #10428 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-08 09:17:59 +08:00
Lucas Liebenwein
d736c7f290
[ https://nvbugs/5761665 ][fix] AutoDeploy: handle bugs for 25.12 dlfw upgrade ( #10511 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 20:16:53 -05:00
yufeiwu-nv
b130d58c88
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml ( #10487 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 17:18:43 +08:00
xinhe-nv
872210468b
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10474 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 03:23:43 -05:00
yingguo-trt
cbf8357e5f
[ https://nvbugs/5726086 ][fix] update kimi-k2-1k1k dataset ( #10473 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-07 01:24:08 -05:00
xinhe-nv
be5579633e
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10457 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 00:57:03 -05:00
Fanrong Li
a34aa63685
[ https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 ( #10449 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
xinhe-nv
1fbadd2dde
[None][chore] Add failed cases into waives.txt ( #10365 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-01-06 22:08:06 -05:00
Ivy Zhang
4a1b2e23b3
[ https://nvbugs/5698434 ][test] add qwen3-4b accuracy test case ( #10382 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 21:56:34 -05:00
Lucas Liebenwein
6095c80e56
[ https://nvbugs/5721907 ][fix] AutoDeploy: improve numerical stability of flashinfer attention test ( #10467 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 21:11:06 -05:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel ( #10429 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[ https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes ( #10466 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00
Mike Iovine
77be1b7572
[ https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test ( #10387 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-06 11:46:34 -05:00
Enwei Zhu
037753f65b
[ https://nvbugs/5748600 ][ci] Unwaive disagg guided decoding test ( #10409 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-06 11:38:12 -05:00
JunyiXu-nv
7d62773c6c
[ https://nvbugs/5760726 ][fix] Use random port in container port section ( #10432 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-06 23:25:46 +08:00
xinhe-nv
704f58dfbe
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10427 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 04:47:54 -05:00
Emma Qiao
6507087c3f
[None][infra] Waive failed cases on 1/6 ( #10440 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-06 16:54:54 +08:00
Bo Li
df0b976b99
[ https://nvbugs/5785206 ][infra] Waive TestQwen3_30B_A3B::test_fp8[latency-torch_compile=False]. ( #10441 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-06 03:32:19 -05:00
William Zhang
ab58d7cac1
[ https://nvbugs/5772361 ][ci] Unwaive tests that have been fixed ( #10424 )
...
These tests were all failing due to the same issue, and were fixed
in #10394 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-05 23:49:54 -08:00
Ivy Zhang
1e828587e5
[TRTLLM-9896][test] add vswa test cases coverage ( #10146 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 02:02:29 -05:00
Yiqing Yan
5108a69fc0
[TRTLLM-9622][infra] Enable DGX_B300 multi-gpu testing in pre-merge pipeline ( #9699 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-06 14:39:55 +08:00
xinhe-nv
998527724c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10367 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 01:09:21 -05:00
Ivy Zhang
22a1d31a27
[None][test] update test case constraint ( #10381 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 12:28:59 +08:00
xinhe-nv
1b1058279c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10384 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 23:02:27 -05:00
kris1025
3e98265682
[None][chore] unwaive qwen3 30b test ( #10115 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2026-01-06 11:17:08 +08:00
alel
6b8ae6fa81
[None][feat] CuteDSL MOE FC1 Enhancement ( #10088 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2026-01-06 09:30:43 +08:00
chenfeiz0326
8a04c05079
[None][fix] Only Use Throughput Metrics to Check Regression ( #10404 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-06 09:21:15 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) ( #10113 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Simeng Liu
3b56548fcf
[ https://nvbugs/5777044 ][chore] Remove solved bugs from waives.txt ( #10422 )
...
Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
2026-01-05 16:56:58 -05:00
Mike Iovine
91ff46d418
[ https://nvbugs/5745152 ][fix] Unwaive gpt oss spec decode test ( #10370 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:06:58 -05:00
Mike Iovine
7a2dab8e85
[ https://nvbugs/5695984 ][fix] Unwaive llama3 eagle test ( #10092 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:03:35 -05:00
Yan Chunwei
6b71b03947
[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution ( #10400 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-05 13:58:03 -05:00
Mike Iovine
db2614ef10
[ https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case ( #10385 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime ( #10397 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 ( #9998 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order ( #10350 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
xinhe-nv
b1733d56f6
[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests ( #10357 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 05:15:52 -05:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. ( #8531 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
HuiGao-NV
2f768b76f8
[ https://nvbugs/5715568 ][fix] Force release torch memory when LLM is destroyed ( #10314 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-05 15:30:18 +08:00
Emma Qiao
c63fad7d96
[None][infra] Waive failed cases again on 1/5 ( #10403 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 02:12:16 -05:00
Yihan Wang
e7a4486294
[ https://nvbugs/5752521 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py ( #10227 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-01-05 14:37:05 +08:00
Yukun He
0937df2c68
[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one ( #10336 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 13:44:09 +08:00
Emma Qiao
5a8bfcbb50
[None][infra]Waive failed cases in post-merge on 1/5 ( #10399 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 12:30:10 +08:00
Tailing Yuan
a7fe043b13
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts ( #10237 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-05 11:23:04 +08:00
Yuxian Qiu
5773a4d775
[ https://nvbugs/5701425 ][chore] Unwaive tests. ( #10269 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-05 09:54:26 +08:00
Fanrong Li
b5a1e10bc0
[ https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata ( #10393 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 09:43:44 +08:00
Wanli Jiang
da0830670a
[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus ( #10234 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-05 09:41:49 +08:00
Lizhi Zhou
82c1ba84a7
[ https://nvbugs/5649010 ][fix] use 0 port as arbitrary port when disagg service discovery is enabled ( #10383 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-05 09:40:40 +08:00
Eran Geva
e2f5455533
[ #8391 ][chore] added deepseek_r1_distill_qwen_32b AutoDeploy perf test to L0 ( #10377 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-04 20:35:52 +02:00
chenfeiz0326
a65b0d4efa
[None][fix] Decrease Pre Merge Perf Tests ( #10390 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 12:21:34 -05:00
Yanchao Lu
c4f27fa4c0
[None][ci] Some tweaks for the CI pipeline ( #10359 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 11:10:47 -05:00
dongfengy
afc533193d
[None][feat] Support nvfp4 for gptoss ( #8956 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-04 08:57:44 -05:00
Jaedeok Kim
a4dcc6a711
[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager ( #10330 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-04 06:07:30 -05:00
Yuxian Qiu
6ba04eba06
[ https://nvbugs/5748683 ][fix] Use get_free_port_in_ci to avoid port conflict. ( #10392 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-04 19:04:58 +08:00
Yanchao Lu
c0b3c2b919
[None][ci] Remove an invalid test waive
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-03 23:34:13 +08:00
Emma Qiao
865992b86b
[None][infra] Waive failed cases on 1/3 ( #10391 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-03 05:54:09 -05:00
Izzy Putterman
bdf6953ddc
[None][feat] Eagle: MLA Based Eagle ( #9677 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-01-02 13:45:07 -05:00
Gal Hubara-Agam
f3dd6da080
[ #10056 ][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test ( #10308 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-02 11:20:19 +02:00
chenfeiz0326
5e0e48144f
[None][fix] Minor updates on Perf Test System ( #10375 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-02 17:17:42 +08:00
fredricz-20070104
f631b25c85
[None][test] Unified slurm extra args management and session collection logic ( #10332 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-01 21:10:51 -05:00
Balaram Buddharaju
4a1b742aa0
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism ( #10312 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 13:42:53 -05:00
Balaram Buddharaju
9f5b750a93
[None][chore] Waive tests blocking pre-merge 12/31 ( #10373 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 03:00:24 -05:00
Balaram Buddharaju
0b75340223
[ https://nvbugs/5744427 ][fix] Make Gemma3 multimodal test fp8 ( #10368 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 01:11:34 -05:00
Yuxian Qiu
ff836d4f41
[ https://nvbugs/5740359 ][chore] Unwaive tests. ( #10260 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-01 09:53:34 +08:00
Lucas Liebenwein
1bbe71b3ed
[ #10244 ][feat] AutoDeploy: separate prefill/decode in flashinfer ( #10252 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-31 17:01:24 -05:00
Simeng Liu
84d107b2f0
[ https://nvbugs/5717993 ][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. ( #10060 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-31 09:22:54 -08:00
xinhe-nv
0d2e2718ce
[None][chore] Add failed cases into waives.txt ( #10354 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 09:30:22 -05:00
chenfeiz0326
a23c6f1092
[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression ( #10282 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-31 21:44:59 +08:00
tcherckez-nvidia
464847c6be
[ #9717 ][chore] Standardize MoE weights interface ( #10295 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-31 07:37:18 -05:00
Jin Li
ef1d4a40b5
[ https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… ( #10212 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-31 06:21:36 -05:00
Emma Qiao
d944430f96
[None][infra] Waive failed cases on 12/31 ( #10353 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-31 17:39:49 +08:00
Necofish
73870ae4ad
[None][feat] support Qwen3-VL dense model in pytorch backend ( #9060 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-12-31 17:54:26 +09:00
xinhe-nv
827d12caaf
[ https://nvbugs/5558516 ][test] add disaggregated stress test ( #9354 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 16:47:36 +08:00
Yuxian Qiu
910a633066
[ https://nvbugs/5774869 ][chore] waive tests. ( #10356 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-31 03:00:52 -05:00
xinhe-nv
1e9c153b4c
[None][fix] disable thread leak check for kimi ( #10337 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 01:31:37 -05:00
xinhe-nv
6c1abf2d45
[None][chore] Add failed cases into waives.txt ( #10344 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 00:11:54 -05:00
Jin Li
34c2fd50a9
[ https://nvbugs/5707359 ][fix] Unwaive OOM case that should be fixed by #9446 ( #10334 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-31 10:41:39 +08:00
Yuxian Qiu
ec8a388c25
[ https://nvbugs/5769890 ][fix] Import get_free_port. ( #10341 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-31 09:47:27 +08:00
Eran Geva
74832a1895
[ https://nvbugs/5766986 ][fix] fixed the shard_all_unprocessed default value to align with the default.yml ( #10271 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-30 08:54:13 -05:00
Bo Li
1f0365da36
[None][infra] Add LongBenchV1 to trtllm-eval. ( #10265 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-30 21:39:34 +08:00
Emma Qiao
6732c76414
[None][infra] Waive failed cases for main on 12/30 ( #10338 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-30 05:17:43 -05:00
Emma Qiao
fb05cd769a
[None][infra] Enable single-gpu CI on spark ( #9304 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-30 17:22:14 +08:00
Emma Qiao
cce7247815
[ https://nvbugs/5594703 ][infra] Unwaive the failed case to test ( #10275 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-30 16:38:54 +08:00
xinhe-nv
6accdbc6a6
[None][chore] Add failed cases into waives.txt ( #10302 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 03:11:52 -05:00
ruodil
0f4ed90560
[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml ( #10225 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-30 02:39:50 -05:00
xinhe-nv
3e0344a53d
[None][chore] Add failed cases into waives.txt ( #10301 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 14:04:28 +08:00
xinhe-nv
48fee8d0f6
[None][chore] Add failed cases into waives.txt ( #10321 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 00:11:49 -05:00
Emma Qiao
f396ad83b0
[None][infra] Remove duplicates in waives.txt ( #10333 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-29 22:32:52 -05:00
Balaram Buddharaju
4944192eae
[None][chore] Waive tests failing in pre-merge 12/28 ( #10311 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-29 20:53:49 -05:00
Neta Zmora
966231d29c
[ #9626 ][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels ( #10304 )
...
Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe
with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused.
Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128,
so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled).
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-29 23:18:15 +02:00
Yueh-Ting (eop) Chen
9cee32ab39
[ https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager ( #10183 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-12-29 14:29:14 +08:00
Yanchao Lu
2f8d6d25a8
[None][ci] Waive an intermittent test hang case ( #10324 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-29 13:04:31 +08:00
Yanchao Lu
270be801aa
[None][ci] Move remaining DGX-B200 tests to LBD ( #9876 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-28 13:55:39 +08:00
JunyiXu-nv
55bc6a5ff8
[ https://nvbugs/5753250 ][fix] Fix undefined local variable in responses utils ( #10154 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-28 06:59:32 +08:00
shivghai
ee07a7c55e
[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 ( #9961 )
...
Signed-off-by: Shiv Ghai <8965168+shivghai@users.noreply.github.com>
2025-12-27 11:50:59 -08:00
Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
chenfeiz0326
d70aeddc7f
[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI ( #9138 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-26 22:50:53 +08:00
Pengyun Lin
684b37df02
[ https://nvbugs/5747938 ][fix] Use local tokenizer ( #10230 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 22:08:10 +08:00
Pengyun Lin
c5b0f9e436
[ https://nvbugs/5633700 ][fix] Cache tiktoken vocab for gpt-oss ( #10219 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 18:39:03 +08:00
dongfengy
bfc591994c
[ https://nvbugs/5745152 ][fix] Fix some GPTOSS test setups ( #10085 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-12-26 17:52:40 +08:00
Neta Zmora
f3f02315df
[None][chore]: small refactoring to auto-deploy MoE operator ( #10300 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-25 12:27:11 -05:00
bhsueh_NV
db3430f589
[None][feat] Support VLM part for Mistral Large 3 ( #10188 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-25 11:20:58 -05:00
Ziyi Xiong
d8b5aeb061
[ https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens ( #10160 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-25 09:13:51 -05:00
ZhichenJiang
46e4af5688
[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations ( #10201 )
...
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 09:04:20 -05:00
Lizhi Zhou
fe12faef81
[ https://nvbugs/5752516 ][chore] unwaive test; fix port conflicts in CI ( #10152 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-25 08:16:09 -05:00
Emma Qiao
0ecdb69b93
[None][infra] Waive failed tests for main on 12/25 ( #10298 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-25 05:22:39 -05:00
Jie Li
83e02ee335
[None][chore] Remove NIM TRT-Backend Test Lists ( #10232 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-12-25 04:01:51 -05:00
Enwei Zhu
182b3eb633
[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout ( #10293 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 02:35:18 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
xinhe-nv
4ae6f6a46c
[None][chore] Add failed cases into waives.txt ( #10249 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-25 01:26:21 -05:00
Venky
c059e6caa1
[TRTC-121] [feat] Add recipe selector UI to complement the recipe database ( #10125 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-24 23:56:54 -05:00
gramnarayan
a9eb5afc9f
[ #9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding ( #9869 )
...
Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend.
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-24 23:30:42 -05:00
Emma Qiao
16fd781e42
[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge ( #9897 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-24 21:45:33 -05:00
Neta Zmora
c4b36d31ff
[ #10137 ][feat] AutoDeploy FP8 MoE refactor ( #10138 )
...
The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time.
The Cutlass MoE kernel is used thru a trtllm torch operator.
Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2.
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-24 18:58:10 +02:00
Stanley Sun
ddac4d7379
[None][test] Add disag-serving auto scaling qa test ( #10262 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-12-24 08:43:47 -05:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests ( #9939 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
xinhe-nv
534700ecd9
[None][chore] Add failed cases into waives.txt ( #10240 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-24 02:21:50 -05:00
Fanrong Li
156f6453dc
[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 ( #10226 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-24 14:39:12 +08:00
Emma Qiao
7b84e48e0f
[None][infra] Waive failed cases om 12/24 ( #10257 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 22:49:57 -05:00
xinhe-nv
fc1f77eafc
[None][chore] Add failed cases into waives.txt ( #10204 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2025-12-24 10:37:23 +08:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Jhao-Ting Chen
92d90fa29a
[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS ( #10018 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski
0027a01ad5
[ https://nvbugs/5680312 ][fix] Updated test waiving ( #9630 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 09:38:12 -08:00
Emma Qiao
984c20e0b2
[None][infra] Waive failed cases on 12/23 ( #10236 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 08:48:54 -05:00
dongfengy
e284d0bf80
[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests ( #10209 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 07:43:13 -05:00
Yukun He
522f1d2bc3
[ https://nvbugs/5764627 ][chore] waive the time-out test ( #10222 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-23 16:36:06 +08:00
Balaram Buddharaju
f2e00a75de
[None][chore] Remove helix test from rtx test list ( #10224 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 03:07:37 -05:00