Yan Chunwei
05058f5e2a
[None][ci] unwaive tests ( #9651 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-04 15:06:07 +08:00
tcherckez-nvidia
f9aa86dbdd
[ #8733 ][feat] Add Llama4 MoE handling to AutoDeploy ( #9556 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
2025-12-04 08:03:33 +02:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks ( #9667 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Jin Li
87e0c8a749
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 ( #7838 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 13:32:11 +08:00
mpikulski
744f0eff1b
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve ( #9669 )
2025-12-03 19:27:11 -08:00
Yiqing Yan
e31142202e
[TRTLLM-7181][infra] Generate test results when pytest timeout happens ( #9396 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 10:05:38 +08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters ( #9540 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Wei-Ming Chen
d9fba85396
[OMNIML-2932] [feat] nvfp4 awq support ( #8698 )
...
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
2025-12-03 19:47:13 +02:00
Michal Guzek
4e5b10da48
[ https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch ( #8253 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-03 15:42:15 +01:00
Patrice Castonguay
ae8d8a266a
[ https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests ( #9637 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-03 22:18:36 +08:00
Guoming Zhang
79e872de31
[None][test] Update Qwen3-next accuracy testing by setting the cuda … ( #9613 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 20:52:53 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose ( #9392 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
xinhe-nv
3a748b166b
[None][chore] Add failed cases into waives.txt ( #9593 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-03 16:26:06 +08:00
fredricz-20070104
80ff9015ce
[ https://nvbugs/5561153 ][test] Fix log error for perf test ( #9622 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-03 15:27:13 +08:00
brb-nv
43f6ad7813
[ https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism ( #9647 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 15:13:59 +08:00
Bo Li
8b5ededc83
[TRTLLM-9391][chore] Automatically estimate required workspace. ( #9535 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-03 12:49:38 +08:00
Suyog Gupta
93871d52b2
[None][chore] AutoDeploy update cuda stream manager for multi-device ( #9575 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-12-02 20:43:14 -08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 ( #9572 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
yufeiwu-nv
21f2ba74e8
[None][test] Remove duplicate test cases ( #9623 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-03 10:35:26 +08:00
brb-nv
55c7023c92
[None][chore] Waive test failing on pre-merge ( #9638 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 07:31:10 +08:00
Grzegorz Kwasniewski
0a7a88e74e
[TRTLLM-8946][feat] Improved heuristics to detect shardable regions ( #9200 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 22:08:19 +01:00
Patrice Castonguay
3991aa9c72
[ https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up ( #9598 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-02 12:48:53 -05:00
Neta Zmora
a560ba5546
[ #9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels ( #9551 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-03 01:39:38 +08:00
Shi Xiaowei
227d42e492
[ https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity ( #9549 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
William Zhang
2dd3ebf037
[ #9150 ][feat] Add code for nano v3 to custom implementation in AD ( #9465 )
...
* Why?
We would like to show an alternative to monkey-patching in AutoDeploy.
* What?
This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.
Part of #9150 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-02 08:56:44 -08:00
Mike Iovine
d5b7f0c8ad
[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch ( #8889 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-02 10:32:02 -05:00
Yan Chunwei
b86256eb54
[TRTLLM-9144][fix] enhance RPC robustness ( #8711 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-12-02 21:37:59 +08:00
brb-nv
be48cdf1d1
[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite ( #9597 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-02 20:10:07 +08:00
Emma Qiao
4a8766c11d
[None][infra] Remove an invalid test name in waives.txt ( #9620 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 18:05:17 +08:00
mpikulski
84a1531594
[TRTLLM-9488][feat] use FlashInfer.sampling by default ( #9545 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-02 16:29:55 +08:00
Emma Qiao
3e4f2388a9
[None][infra] Waive failed cases for main branch ( #9615 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 15:48:27 +08:00
shuyixiong
1a2118b8fe
[ https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view ( #9576 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-12-02 15:41:32 +08:00
xinhe-nv
ad46d19027
[None][chore] Add failed cases into waives.txt ( #9588 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 14:24:11 +08:00
ruodil
4586b5f42f
[ https://nvbugs/5582091 ][test] increase warmup times in testing for multi-gpu cases ( #9578 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-02 14:22:49 +08:00
Wanli Jiang
5657a00ec0
[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend ( #9261 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-02 13:40:20 +08:00
xinhe-nv
3911d0496e
[None][fix] Waive gb200 ( #9580 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 12:09:21 +08:00
JunyiXu-nv
9a6df980cd
[ https://nvbugs/5703953 ][fix] Use random port for disagg tests ( #9582 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-02 11:40:14 +08:00
Iman Tabrizian
356a52edf5
[None][feat] Add support for KVCache reuse for DSv32 ( #9383 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-02 11:14:30 +08:00
Shijie
dcf5c86720
[None][feat] Unify nvfp4 gemm backend ( #8963 )
...
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Shijie <jaywan@nvidia.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-02 11:03:51 +08:00
Eran Geva
c9771ebb99
[ #9198 ][feat] Refactor dist ops in AutoDeploy ( #9301 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-02 02:36:32 +08:00
Venky
639c939a4f
[TRTC-1943][feat] Env vars override support in LLM API ( #9104 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-01 10:04:49 -08:00
Stefan Niebler
f155812eb0
[TRTLLM-6756][feat] Add Beam Search to TorchSampler ( #8509 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-12-01 18:48:04 +01:00
Yanchao Lu
7127c4407a
[None][test] [None][test] Waive main branch test failures 12/1 ( #9566 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-01 21:54:53 +08:00
Shi Xiaowei
48b1d31895
[ https://nvbugs/5651854 ][infra] Enable perf metrics during accuracy testing ( #9140 )
2025-12-01 20:15:32 +08:00
alel
4107254c82
[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm ( #9428 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2025-12-01 18:10:45 +08:00
JadoTu
a92af27411
[None][chore] remove qwen3-next accuracy tests ( #9534 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-01 11:49:37 +08:00
Pengbo Wang
aa3310f64f
[ https://nvbugs/5503479 ][fix] Temporarily lower reference accuracy to stabilize CI ( #9398 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-12-01 11:49:14 +08:00
Enwei Zhu
2e3ac3c48f
[ https://nvbugs/5684703 ][fix] Unwaive disagg guided decoding test ( #9466 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 11:39:40 +08:00