Jonas Li
2645a78f34
[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature ( #9682 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-06 02:24:51 -08:00
Enwei Zhu
7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP ( #9592 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
Mike Iovine
31ab367576
[None][chore] Waive flakey disagg tests ( #9749 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 13:07:05 -08:00
jthomson04
299601aebf
[ https://nvbugs/5670672 ][fix] Fix flaky KV connector tests ( #9676 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-05 10:04:54 -08:00
Robin Kobus
eb0b426e5d
[None][refactor] Improve request processing function in sampler ( #9671 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:41:49 +01:00
Robin Kobus
faf682b8bc
[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders ( #9583 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:07:20 +01:00
yufeiwu-nv
68253d9d29
[ https://nvbugs/5518713 ][test] Refactor core test lists by merging with llm_perf_cluster.yml ( #9714 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-05 01:15:37 -08:00
Kaiyu Xie
e06c582648
[None] [tests] Unwaive EPLB tests ( #9625 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-05 00:13:24 -08:00
gramnarayan
74df9b180b
[ #9602 ][feat] AutoDeploy: Support TRTLLM Sampler ( #9641 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 19:24:11 -08:00
Lizhi Zhou
dc766fc126
[ https://nvbugs/5633340 ][fix] start disagg workers and servers on free ports ( #9694 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:51:29 +08:00
Lizhi Zhou
0d0a16fff4
[TRTLLM-8920][feat] decouple disagg service from fastapi ( #8714 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:44:16 +08:00
xinhe-nv
530af1a98e
[None][chore] Add failed cases into waives.txt ( #9662 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-04 22:33:22 +08:00
Anthony Chang
60cdca3740
[None][fix] Recover TRTLLM MoE Perf for DEP ( #9562 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-04 22:10:25 +08:00
Jin Li
e5d4305c04
[ https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … ( #9617 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 18:17:24 +08:00
ruodil
8a392af28f
[None][test] rename wide ep and disagg metric name in perf test ( #9704 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-04 18:16:06 +08:00
Yan Chunwei
05058f5e2a
[None][ci] unwaive tests ( #9651 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-04 15:06:07 +08:00
tcherckez-nvidia
f9aa86dbdd
[ #8733 ][feat] Add Llama4 MoE handling to AutoDeploy ( #9556 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
2025-12-04 08:03:33 +02:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks ( #9667 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Jin Li
87e0c8a749
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 ( #7838 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 13:32:11 +08:00
mpikulski
744f0eff1b
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve ( #9669 )
2025-12-03 19:27:11 -08:00
Yiqing Yan
e31142202e
[TRTLLM-7181][infra] Generate test results when pytest timeout happens ( #9396 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 10:05:38 +08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters ( #9540 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Wei-Ming Chen
d9fba85396
[OMNIML-2932] [feat] nvfp4 awq support ( #8698 )
...
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
2025-12-03 19:47:13 +02:00
Michal Guzek
4e5b10da48
[ https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch ( #8253 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-03 15:42:15 +01:00
Patrice Castonguay
ae8d8a266a
[ https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests ( #9637 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-03 22:18:36 +08:00
Guoming Zhang
79e872de31
[None][test] Update Qwen3-next accuracy testing by setting the cuda … ( #9613 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 20:52:53 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose ( #9392 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
xinhe-nv
3a748b166b
[None][chore] Add failed cases into waives.txt ( #9593 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-03 16:26:06 +08:00
fredricz-20070104
80ff9015ce
[ https://nvbugs/5561153 ][test] Fix log error for perf test ( #9622 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-03 15:27:13 +08:00
brb-nv
43f6ad7813
[ https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism ( #9647 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 15:13:59 +08:00
Bo Li
8b5ededc83
[TRTLLM-9391][chore] Automatically estimate required workspace. ( #9535 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-03 12:49:38 +08:00
Suyog Gupta
93871d52b2
[None][chore] AutoDeploy update cuda stream manager for multi-device ( #9575 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-12-02 20:43:14 -08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 ( #9572 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
yufeiwu-nv
21f2ba74e8
[None][test] Remove duplicate test cases ( #9623 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-03 10:35:26 +08:00
brb-nv
55c7023c92
[None][chore] Waive test failing on pre-merge ( #9638 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 07:31:10 +08:00
Grzegorz Kwasniewski
0a7a88e74e
[TRTLLM-8946][feat] Improved heuristics to detect shardable regions ( #9200 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 22:08:19 +01:00
Patrice Castonguay
3991aa9c72
[ https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up ( #9598 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-02 12:48:53 -05:00
Neta Zmora
a560ba5546
[ #9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels ( #9551 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-03 01:39:38 +08:00
Shi Xiaowei
227d42e492
[ https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity ( #9549 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
William Zhang
2dd3ebf037
[ #9150 ][feat] Add code for nano v3 to custom implementation in AD ( #9465 )
...
* Why?
We would like to show an alternative to monkey-patching in AutoDeploy.
* What?
This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.
Part of #9150 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-02 08:56:44 -08:00
Mike Iovine
d5b7f0c8ad
[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch ( #8889 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-02 10:32:02 -05:00
Yan Chunwei
b86256eb54
[TRTLLM-9144][fix] enhance RPC robustness ( #8711 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-12-02 21:37:59 +08:00
brb-nv
be48cdf1d1
[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite ( #9597 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-02 20:10:07 +08:00
Emma Qiao
4a8766c11d
[None][infra] Remove an invalid test name in waives.txt ( #9620 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 18:05:17 +08:00
mpikulski
84a1531594
[TRTLLM-9488][feat] use FlashInfer.sampling by default ( #9545 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-02 16:29:55 +08:00
Emma Qiao
3e4f2388a9
[None][infra] Waive failed cases for main branch ( #9615 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 15:48:27 +08:00
shuyixiong
1a2118b8fe
[ https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view ( #9576 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-12-02 15:41:32 +08:00
xinhe-nv
ad46d19027
[None][chore] Add failed cases into waives.txt ( #9588 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 14:24:11 +08:00
ruodil
4586b5f42f
[ https://nvbugs/5582091 ][test] increase warmup times in testing for multi-gpu cases ( #9578 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-02 14:22:49 +08:00
Wanli Jiang
5657a00ec0
[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend ( #9261 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-02 13:40:20 +08:00
xinhe-nv
3911d0496e
[None][fix] Waive gb200 ( #9580 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 12:09:21 +08:00
JunyiXu-nv
9a6df980cd
[ https://nvbugs/5703953 ][fix] Use random port for disagg tests ( #9582 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-02 11:40:14 +08:00
Iman Tabrizian
356a52edf5
[None][feat] Add support for KVCache reuse for DSv32 ( #9383 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-02 11:14:30 +08:00
Shijie
dcf5c86720
[None][feat] Unify nvfp4 gemm backend ( #8963 )
...
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Shijie <jaywan@nvidia.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-02 11:03:51 +08:00
Eran Geva
c9771ebb99
[ #9198 ][feat] Refactor dist ops in AutoDeploy ( #9301 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-02 02:36:32 +08:00
Venky
639c939a4f
[TRTC-1943][feat] Env vars override support in LLM API ( #9104 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-01 10:04:49 -08:00
Stefan Niebler
f155812eb0
[TRTLLM-6756][feat] Add Beam Search to TorchSampler ( #8509 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-12-01 18:48:04 +01:00
Yanchao Lu
7127c4407a
[None][test] [None][test] Waive main branch test failures 12/1 ( #9566 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-01 21:54:53 +08:00
Shi Xiaowei
48b1d31895
[ https://nvbugs/5651854 ][infra] Enable perf metrics during accuracy testing ( #9140 )
2025-12-01 20:15:32 +08:00
alel
4107254c82
[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm ( #9428 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2025-12-01 18:10:45 +08:00
JadoTu
a92af27411
[None][chore] remove qwen3-next accuracy tests ( #9534 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-01 11:49:37 +08:00
Pengbo Wang
aa3310f64f
[ https://nvbugs/5503479 ][fix] Temporarily lower reference accuracy to stabilize CI ( #9398 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-12-01 11:49:14 +08:00
Enwei Zhu
2e3ac3c48f
[ https://nvbugs/5684703 ][fix] Unwaive disagg guided decoding test ( #9466 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 11:39:40 +08:00
Li Min
1797e91dfd
[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. ( #9543 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-01 10:19:36 +08:00
heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV ( #9333 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
xxi
c12e67bb66
[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend ( #9486 )
2025-12-01 08:37:07 +08:00
JunyiXu-nv
3f588198dc
[None][fix] Fix port conflict in disagg tests ( #9474 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-30 17:33:22 +08:00
Emma Qiao
c927ccf510
[None][infra] Wiave failed tests for main branch on 11/30 ( #9555 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-30 16:13:20 +08:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
dominicshanshan
6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase ( #9522 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00
Grzegorz Kwasniewski
cff54fcae3
[ #8948 ][feat] Support custom sharding config ( #9143 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-29 05:28:05 +08:00
mpikulski
bc355eadf5
[TRTLLM-9488][fix] llmapi references ( #9547 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 08:54:05 -08:00
dominicshanshan
70efa3ac43
[None][infra] Waive failed case in pre-merge on 11/28 ( #9537 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-11-28 20:53:45 +08:00
mpikulski
e5f39ec7cf
[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option ( #9454 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 13:00:39 +01:00
Emma Qiao
2d7421b314
[None][infra] Waive failed cases for main branch on 11/28 ( #9539 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-28 17:19:55 +08:00
Liao Lanyu
bf84d9cea1
[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos ( #9533 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-28 14:52:05 +08:00
yufeiwu-nv
08755a809d
[ https://nvbugs/5689658 ][test] Fix gpu lock issue running on cluster ( #9441 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-28 13:59:22 +08:00
Yukun He
60c43a200a
[None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. ( #9211 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-28 13:32:21 +08:00
JunyiXu-nv
c87e81c1d8
[ https://nvbugs/5685015 ][fix] Update invalid max_token test ( #9435 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-28 11:41:16 +08:00
Bo Li
19f3f4e520
[ https://nvbugs/5637037 ][chore] Update waive lists. ( #9386 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-28 10:45:22 +08:00
Yueh-Ting (eop) Chen
4cbfc10b28
[ https://nvbugs/5674665 ][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 ( #9518 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-27 21:40:34 +08:00
Bo Li
62b771877c
[TRTLLM-9389][chore] Refactor AlltoallMethodType. ( #9388 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-27 21:09:29 +08:00
Fanrong Li
2d5eadf65f
[None][fix] fix TP support for DeepSeek-V3.2 on hopper ( #9484 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-27 21:02:25 +08:00
JadoTu
51bf7164d3
[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 ( #9330 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-27 18:05:00 +08:00
Lizhi Zhou
8104a78931
[None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR ( #9447 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-27 14:25:44 +08:00
Emma Qiao
0442510304
[None][infra] Waive failed case in pre-merge on 11/27 ( #9507 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-27 13:53:33 +08:00
HuiGao-NV
03331bc43d
[ https://nvbugs/5547414 ][fix] enable case after using local cache model ( #9473 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-27 12:18:20 +08:00
Patrice Castonguay
1b2da426cd
[ https://nvbugs/5680310 ][fix] Fix ctx only timed out test ( #9410 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-27 11:21:21 +08:00
Shi Xiaowei
e76e149861
[ https://nvbugs/5608930 ][fix] Fix a typo ( #9487 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-27 09:05:17 +08:00
Zheyu Fu
dbbed1f85a
[None][ci] Waive blackwell test on spec gate. ( #9502 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-27 07:19:58 +08:00
Chenghao Zhang
18fbda5cdb
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers ( #9422 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-26 14:39:20 -08:00
Chenghao Zhang
bc7b60e016
[None][feat] AutoDeploy: Remove redundant copies in mamba layers ( #9461 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-26 14:38:33 -08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model ( #9376 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
JunyiXu-nv
b7308a4000
[ https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 ( #9032 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-26 13:09:06 +08:00
Wanli Jiang
d100599ea7
[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm ( #9246 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-26 11:12:35 +08:00
shuyixiong
d8acea1db3
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights ( #9224 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-26 10:59:06 +08:00
QI JUN
5972119e1c
[None][ci] move some slow test cases of DGX-B200 to post merge ( #9467 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-26 10:48:53 +08:00
fredricz-20070104
6a64cb4c71
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases ( #9356 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-26 10:34:49 +08:00
Chuang Zhu
0e9c7f8c07
[ https://nvbugs/5685143 ][fix] avoid cudaFree overlap with cuda graph ( #9438 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-25 16:20:29 -08:00
Suyog Gupta
e484bec82f
[None][chore] AutoDeploy add multi stream moe pass to default.yaml ( #9430 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-25 14:16:13 -08:00
Robin Kobus
32f53910ef
[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode ( #9308 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-25 22:11:51 +01:00
Eran Geva
afc52d7b93
[ https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. ( #9145 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 10:56:07 -08:00
mpikulski
899fda9e47
[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs ( #9457 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:53:53 +01:00
mpikulski
c5f52ab304
[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) ( #9411 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:46:48 +01:00
Fanrong Li
8da59103d6
[ https://nvbugs/5680905 ][fix] Relax the MMLU accuracy requirement for DS-v3.2 ( #9439 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-26 00:32:20 +08:00
Yan Chunwei
1f43dc8174
[None][ci] waive a test ( #9458 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-25 07:04:20 -08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Pengyun Lin
fa61825c74
[None][feat] Support custom chat template for tool calling ( #9297 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-11-25 22:07:04 +08:00
Tailing Yuan
51ef0379d2
[None][feat] Add a parser to layer-wise benchmarks ( #9440 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-25 05:45:16 -08:00
Shi Xiaowei
60786574db
[None][fix] Mitigate test timeout issues ( #9445 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-25 20:17:54 +08:00
Chao Ni
a2d9e6250a
[ https://nvbugs/5667922 ][fix] Update long context evaluation config ( #9426 )
...
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-11-25 19:33:38 +08:00
Yanchao Lu
ff02e0f05c
[None][ci] Move more test stages to use OCI machines ( #9395 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com>
2025-11-25 15:59:13 +08:00
Eran Geva
6af01dc664
[ #8391 ][chore] test_perf.py to lock clocks read from gpu_configs.yml instead of max freq ( #9409 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 09:20:33 +02:00
Emma Qiao
15616e3ee5
[None][infra] Waive failed cases for main branch on 11/25 ( #9429 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 23:18:15 -08:00
William Zhang
a4049fc557
[ #9413 ][fix] Minor fixes to nemotron H and custom models in AD ( #9416 )
...
* Why?
There were a couple of issues with the recently merged custom model
injection for AutoDeploy + the reference implementation of nemotron
H:
- `d_mlp` was left in despite being mathematically always null (could
lead to runtime issues during sharding).
- the custom model mapping was inherited by children factories.
* What?
This commit fixes these issues, and refactors the key of the custom
implementation to be based on the name of the configuration class as
well.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-24 20:17:33 -08:00
Suyog Gupta
efd503751f
[ #9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy ( #9322 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-24 19:50:10 -08:00
kris1025
d1c724958d
[None][chore] unwaive ampere kernels test ( #9389 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-11-25 11:28:43 +08:00
xinhe-nv
0a9ae2e3e6
[None][chore] Remove closed bugs ( #9381 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-24 18:49:57 -08:00
Fanrong Li
5a99c9734d
[TRTLLM-8777][feat] Update DeepGEMM to the latest commit to include optimizations for DeepSeek-v3.2 ( #9380 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-25 08:58:08 +08:00
QI JUN
786d308b88
[ https://nvbugs/5685428 ][fix] fix test_openai_chat_multimodal.py ( #9406 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 16:56:33 -08:00
bhsueh_NV
1a93583438
[None][feat] Support Yarn on QwQ-32B model ( #9059 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>
2025-11-25 07:27:28 +08:00
Yibin Li
1ce483c999
[TRTLLM-7967][feat] Adding Starcoder2 PyTorch Backend Support ( #8923 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-11-24 11:23:22 -08:00
Emma Qiao
2c869f2bda
[None][infra] Waive failed cases for main ( #9400 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 17:42:19 +08:00
Emma Qiao
af72d93fa9
[None][infra] Waive failed cases on main branch ( #9384 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-23 22:53:02 -08:00
brb-nv
c045e359a7
[ https://nvbugs/5637012 ][fix] Fix helix unit tests ( #9369 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-23 19:34:22 -08:00
QI JUN
34a6d2d28f
[TRTLLM-9302][chore] Move build config from BaseLlmArgs to TrtLlmArgs ( #9249 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 10:54:41 +08:00
Chenghao Zhang
e1c9aa7d6a
[None][chore] AutoDeploy: Add the Nemotron MOE to CI ( #9328 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-23 12:12:12 -08:00
William Zhang
11a0b276fb
[ #9230 ][feat] Slimmed down implementation of nemotron H ( #9235 )
...
* Why?
The reference nemotron H code on HuggingFace is out of date,
and therefore bugged, and has several untested code paths.
This makes an already hairy patching system even hairier.
The proposal is to do away with those patches, and replace the
original implementation with one that is heavily slimmed down.
* What?
This PR sets the basis for an alternative path with such a
slimmed down implementation that:
- fixes bugs in the current HF implementation
- adds no new dependencies to TensorRT-LLM
- does away with unnecessary features for TensorRT-LLM/
AutoDeploy:
- no training related code (dropout, gradient checkpointing, etc.)
- no caching logic (we want to replace it with our own anyway)
- no attention masking where possible
- reuses existing AD custom ops for mamba SSM update /
causal conv1d / attention
In order for the above to be usable in the AD apparatus,
`AutoModelForCausalLMFactory` is extended to allow registrations
of custom model implementations.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-23 03:13:32 -08:00
Yan Chunwei
1ef69ecbb1
[None][ci] waive two ray tests ( #9375 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-23 15:39:01 +08:00
dongfengy
268ea9bb8a
[None][test] Add one-model and overlap-scheduling to eagle tests for GPTOSS ( #9312 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-21 22:52:53 -08:00
Enwei Zhu
13fbd4366a
[TRTLLM-9370][feat] Integration of CuteDSL NVFP4 grouped GEMM (Part 2: SwiGLU Fusion and Finalize Fusion) ( #9288 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-21 14:03:38 -08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
mpikulski
095b6864a8
[TRTLLM-8650][fix] beam search request validation ( #8433 ) ( #9228 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:08:45 -08:00
Emma Qiao
041564188c
[None][infra] Waive failed cases in main post-merge on 11/21 ( #9360 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-21 18:01:53 +08:00
QI JUN
b6483ef3e7
[None][ci] waive a test case of test_ad_build_small_multi.py ( #9355 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-21 16:25:04 +08:00
Ivy Zhang
28e9bf6167
[None][chore] add periodic junit xml path in conftest ( #9337 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-20 22:46:25 -08:00
QI JUN
e2a372a3b1
[None][ci] waive test_llm_context_only_timed_out_kv_cache_exhausted ( #9351 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-20 20:20:57 -08:00
Barry Kang
a3433dd54e
[ https://nvbugs/5325296 ][fix] Enable relaxed acceptance test on Blackwell ( #8709 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Zhanrui Sun
62e20a5441
[None][infra] Remove invaild waived tests which not in release branch ( #8841 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
6185225501
[ https://nvbugs/5488118 ][fix] Unwaive passed tests ( #8758 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Dom Brown
0c8de1f45d
[ https://nvbugs/5575841 ] [test] Move test_moe.py to serial tests to improve stability + unwaive FP4 MoE torch unit tests ( #8422 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
xiweny
05aabfbc1e
[ https://nvbugs/5601203 ] [fix]Restrict fp8 blockscale moe case ( #8583 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Eran Geva
3d66e56adb
[ https://nvbugs/5572320 ][fix] Ported test_ad_trtllm_bench.py from main ( #8671 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
9a79f32f7a
[ https://nvbugs/5608489 ][fix] Fix output unpack issues for Llama3/4 NVFP4 models. ( #8679 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Ivy Zhang
25c0624750
[None][test] Clean cache for certain easily hang cases ( #8619 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jie Li
36e244f35e
[ https://nvbugs/5587456 ][fix] Remove multimodal test cases using TRT backend ( #8611 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
348668e3ae
[ https://nvbugs/5575902 ][fix] set max_batch_size=1 to stabilize accuracy test result ( #8609 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
33b0b945c7
[ https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue ( #8519 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
81fd9be87d
[ https://nvbugs/5575829 ][fix] Unwaive gpt-oss test ( #8576 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Bo Deng
4ca6fe83d8
[ https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… ( #8500 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Guoming Zhang
af3900a195
[ https://nvbugs/5504095 ][fix] Unwaive test_user_specify_workspace case. ( #8316 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Simeng Liu
9286223288
[ https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. ( #8440 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
ee6944bfa2
[ https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 ( #8429 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
yufeiwu-nv
0e746fad45
[ https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 ( #9260 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-20 00:58:42 -08:00
Liao Lanyu
04ad9f96fa
[ https://nvbugs/5667687 ][fix] Set correct lm_head_tp_size_upper_bound ( #9300 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-20 00:41:00 -08:00
Emma Qiao
b018b2698d
[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit ( #9265 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-20 15:47:23 +08:00
mpikulski
a39e8c5567
[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema ( #9305 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-20 08:32:23 +01:00
QI JUN
1bdd3ba173
[None][ci] waive test_disagg_server_restart ( #9326 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-19 22:34:03 -08:00
Yechan Kim
d5622b2689
[None][fix] Multimodal InputProcessor dummy builder fix ( #8916 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-19 22:32:21 -08:00
Chang Liu
79a6c9742b
[None][fix] Use fp32 for indexer weight_proj GEMM ( #9243 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-19 21:52:38 -08:00
Chenghao Zhang
cd44f80abd
[ #9316 ][feat] AutoDeploy: Add the accuracy test for Nemotron MOE models ( #9317 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-19 21:48:50 -08:00
Bo Deng
2128f73d58
[TRTLLM-9247][infra] Upgrade NIXL to 0.7.1 ( #9055 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-20 11:01:02 +08:00
Yukun He
b6bced83c0
[TRTLLM-7963][feat] Use CUDAGraph to improve the tuning accuracy for AutoTuner. ( #9089 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-20 08:54:29 +08:00
brb-nv
f6ec6e2222
[None][chore] Waive tests timing out on main ( #9315 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-19 13:10:06 -08:00
NVShreyas
1eae941d77
[ #9237 ][feat] enable iter stats in autodeploy ( #9278 )
...
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-11-19 19:29:29 +01:00
Neta Zmora
7ab02ad7b5
[None][feature] AutoDeploy: tighter MoE UT thresholds ( #9195 )
...
Scale down the weights in the MoE test so that the output has reasonable magnitude, allowing for tighter atol and rtol
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-19 08:37:51 -08:00
Bo Li
d8b05894ee
[None][perf] Adjust select_alltoall_method_type. ( #8950 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-19 07:43:55 -08:00
mpikulski
46dd9886bb
[ https://nvbugs/5661877 ][fix] fix test regression in TestBatchedSampling::test_samples ( #9215 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-19 01:44:44 -08:00
xinhe-nv
0f77fec932
[None][chore] Add failed cases into waives.txt ( #9289 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-19 17:03:43 +08:00
CarstyYou
ee941ac779
[ https://nvbugs/5456493 ][feat] add fp8 dense for sm120 ( #9174 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-19 14:40:34 +08:00
nvxuanyuc
a79c0dfb43
[None][fix] Update GLM model accuracy test ( #9286 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 21:59:01 -08:00
Emma Qiao
67d3eb26af
[None][infra] Waive failed cases for main branch on 11/17 ( #9266 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-18 20:07:03 -08:00
ChristinaZ
941a54c66a
[None][feat] Update the indexer topK ( #9255 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 11:49:00 +08:00
xinhe-nv
286ace22ed
[None][chore] Add failed cases into waives.txt ( #9242 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 19:27:55 -08:00
Ivy Zhang
782dfca7e8
[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error ( #9172 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 18:26:32 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
xinhe-nv
35658eab55
[None][chore] Add failed cases into waives.txt ( #9193 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 17:47:55 -08:00
Enwei Zhu
7c4777a571
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM ( #8880 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-18 17:40:12 -08:00
Lizhi Zhou
c789000a62
[ https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability ( #9203 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-19 08:55:42 +08:00
Bo Deng
34f845bf69
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests ( #9247 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-11-18 14:46:20 -08:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example ( #8434 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Ziyi Xiong
7c4344b92e
[ https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv ( #9210 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
mpikulski
04fb481da3
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding ( #9178 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-18 09:41:59 -08:00
Kaiyu Xie
d076aa44d3
[None] [tests] Unwaive wide ep related tests ( #9204 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-18 08:54:46 -08:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). ( #8194 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Ivy Zhang
160b361588
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check ( #9088 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 05:55:00 -08:00
Ivy Zhang
ca41a71f92
[TRTLLM-8948][test] Add long bench case ( #9165 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 04:41:48 -08:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM ( #8256 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
QI JUN
c3376fa114
[None][ci] split speculative test case into several small cases ( #9209 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-17 17:02:25 -08:00
Robin Kobus
df41f220a2
[TRTLLM-8831][feat] Enable early exit with overlap scheduler ( #8587 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-17 18:07:13 +01:00
Kaiyu Xie
04be5a704e
[None] [fix] Fix missing ActivationType issue ( #9171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-17 10:43:25 +08:00
Anthony Chang
86cfb3ea7e
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound ( #9025 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-17 10:04:29 +08:00
Emma Qiao
d16b1a84c5
[None][infra] Waive a failed case in pre-merge stage 11/16 ( #9192 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-17 09:36:56 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch ( #8975 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
Emma Qiao
2854f0cf3d
[None][infra] Waive failed tests for main branch 11/15 ( #9187 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-16 01:48:25 -08:00
brb-nv
63237494db
[None][chore] Waive failing tests blocking pre-merge ( #9189 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-16 01:06:03 -08:00
Erin
fe69243157
[None][chore] Add placement test for ray executor ( #9122 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-14 23:10:59 -08:00
Chang Liu
bed4e95e9f
[ https://nvbugs/5629887 ][fix] Add missing device count guard for DSv32 multiGPU tests ( #9159 )
2025-11-14 07:52:23 -08:00
xinhe-nv
49b7e6301a
[None][chore] Add failed cases into waives.txt ( #9156 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-14 06:28:22 -08:00
mpikulski
80bf840e69
[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… ( #9146 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-14 11:36:22 +01:00
yuanjingx87
d72321a32e
[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling ( #9161 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 01:49:26 -08:00
Suyog Gupta
d12cb9436d
[None][feat] Autodeploy add triton configs and optimize mamba prefill ( #9083 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-13 19:15:43 -08:00
QI JUN
3c950910a0
[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] ( #9162 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-13 18:56:37 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks ( #9065 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
JunyiXu-nv
fdb0787e85
[None][chore] Support json_schema in response_format ( #8934 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-14 09:43:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server ( #8765 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Neta Zmora
34dc6869f3
[ #8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 ( #9011 )
...
Update TRTLLM Cutlass MoE kernels with ReLU2 activation.
Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function.
The PR adds this and adds an API to set the activation function, in general.
The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954 .
The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from
Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`).
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 16:54:45 -08:00
dongxuy04
a370643b26
[None][fix] support topk autotuner input for expert slot per group larger than 32 ( #9087 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-14 08:37:20 +08:00
Frida Hou
e96a3d294d
[None][autodeploy] minor refactor to rmsnorm transforms ( #8657 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:13:58 -08:00
Ziyi Xiong
a7aaf50541
[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding ( #8706 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-13 10:20:16 -05:00
William Zhang
121140cfec
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser ( #8817 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-13 04:34:38 -08:00
Lizhi Zhou
48a27c7bef
[ https://nvbugs/5633340 ][chore] unwaive test_auto_scaling.py::test_disagg_server_restart ( #9131 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-13 01:45:36 -08:00
Emma Qiao
d0ea417ec8
[None][infra] Waive failed tests for main 11/13 ( #9132 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-13 01:00:40 -08:00
xinhe-nv
548f5ce4bc
[None][fix] waive failed tests ( #9090 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 23:40:00 -08:00
xinhe-nv
8fa3c55c76
[None][chore] Remove closed bugs ( #9114 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 22:49:37 -08:00
ruodil
c86e36fe38
[None][test] add deepseek and qwen cases for rtx series ( #8839 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-12 22:28:02 -08:00
HuiGao-NV
cde18c12da
[ https://nvbugs/5640873 ][fix] Move thop tests to pre-merge ( #9094 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-13 13:08:13 +08:00
Zhang Ge
49df731b96
[ #6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels ( #6917 )
...
Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-13 12:14:58 +08:00
Yan Chunwei
4fd93bdc2c
[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming ( #9118 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-12 19:55:09 -08:00
Yan Chunwei
8a8883bc73
[None][chore] Waive test_llm_rpc_streaming ( #9113 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-13 11:06:26 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number ( #9003 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
QI JUN
3416efbc29
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill ( #9111 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:06:32 +08:00
dongxuy04
9241ccaf27
[None][feat] Enable EPLB for trtllm-gen and cutlass backend ( #8886 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-12 12:30:27 -08:00
Chenghao Zhang
5f26c31954
[ https://nvbugs/5636912 ][fix] AutoDeploy: Unwaive the test ( #9018 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-12 12:26:38 -08:00
Patrice Castonguay
8a751a0e56
[None][chore] Remove is_disaggregated param in executor request queue ( #9049 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-12 13:37:15 -05:00
Fanrong Li
780d4f9dc5
[None][feat] Add MTP>1 support for DS-v3.2 ( #9045 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-12 09:56:12 -08:00
Iman Tabrizian
cdde15b275
[TRTLLM-8540][feat] Add support for disagg in DSv3.2 ( #8735 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-11-12 08:21:11 -08:00
mpikulski
264d38e6c5
[TRTLLM-9175][test] ensure sampling is async ( #9076 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-12 15:27:52 +01:00
yufeiwu-nv
b7a2574c60
[ https://nvbugs/5568991 ][test] Remove Phi-3 models ( #9066 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-12 03:16:36 -08:00
QI JUN
4003dc7574
[None][ci] waive some test cases of disaggregated serving ( #9085 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-12 15:06:21 +08:00
Emma Qiao
bb6eb9510d
[None][infra] Waive a failed case of disaggregated/test_disaggregated.py ( #9074 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 19:38:32 -08:00
QI JUN
fd703fbb7b
[None][ci] run speculative unit tests serially ( #9080 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 19:06:44 -08:00
Lucas Liebenwein
aca56097cb
[None][fix] AutoDeploy: update nano3 accuracy test ( #9061 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-11 12:26:31 -08:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner ( #7572 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
Chenghao Zhang
ec9cf715a2
[None][feat] AutoDeploy: Perf improvement for mamba layers ( #8991 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-11 08:27:07 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm ( #8840 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling ( #9012 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
HuiGao-NV
23c388c58b
[ https://nvbugs/5616189 ][fix] Make more cases use local cached models ( #8935 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-11 03:14:05 -08:00
QI JUN
0ce22ce928
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] ( #9069 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 02:11:15 -08:00
Yiqing Yan
b7d51c5549
[None][chore] Remove duplicated waive test ( #9067 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-11 16:49:49 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 ( #9058 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt ( #8998 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling ( #8988 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py ( #8938 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test ( #9019 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Yechan Kim
0938a3ad2a
[ https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix ( #9034 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[ https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp ( #9047 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00