Jin Li
e5d4305c04
[ https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … ( #9617 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 18:17:24 +08:00
ruodil
8a392af28f
[None][test] rename wide ep and disagg metric name in perf test ( #9704 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-04 18:16:06 +08:00
zackyoray
398d24232d
[None][feat] Add NIXL-LIBFABRIC support ( #9225 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray <yorayz@nvidia.com>
2025-12-04 15:38:06 +08:00
Yan Chunwei
05058f5e2a
[None][ci] unwaive tests ( #9651 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-04 15:06:07 +08:00
tcherckez-nvidia
f9aa86dbdd
[ #8733 ][feat] Add Llama4 MoE handling to AutoDeploy ( #9556 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
2025-12-04 08:03:33 +02:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks ( #9667 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Jin Li
87e0c8a749
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 ( #7838 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 13:32:11 +08:00
Necofish
323a82f4d5
[None][fix] fix error when processing batches containing both text and mm data ( #8381 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-12-04 14:28:24 +09:00
Yiqing Yan
47f650ca13
[TRTLLM-5093][infra] Write env variables to a file in the interactive debug session ( #6792 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 11:41:27 +08:00
mpikulski
744f0eff1b
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve ( #9669 )
2025-12-03 19:27:11 -08:00
TensorRT LLM
94924634e0
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-04 03:19:23 +00:00
Yiqing Yan
e31142202e
[TRTLLM-7181][infra] Generate test results when pytest timeout happens ( #9396 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 10:05:38 +08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters ( #9540 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Lucas Liebenwein
a1964bcbbc
[ #9643 ][fix] AutoDeploy: fix nano sharding config ( #9668 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-04 03:10:25 +08:00
Wei-Ming Chen
d9fba85396
[OMNIML-2932] [feat] nvfp4 awq support ( #8698 )
...
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
2025-12-03 19:47:13 +02:00
Gal Hubara-Agam
d7bd62b1a0
[ https://nvbugs/5693853 ][fix] Fix error handling when querying machin… ( #9483 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-12-03 19:44:51 +02:00
Guoming Zhang
b5e2b9b51f
[ https://nvbugs/5702795 ][fix] Remove the warning message for aten.log. ( #9665 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-04 00:02:15 +08:00
Iman Tabrizian
09beaa5933
[None][fix] Fix wide ep MoE error ( #9642 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-03 23:11:06 +08:00
Michal Guzek
4e5b10da48
[ https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch ( #8253 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-03 15:42:15 +01:00
Patrice Castonguay
ae8d8a266a
[ https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests ( #9637 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-03 22:18:36 +08:00
Guoming Zhang
e2f82085f1
[None][doc] Replace the tensorrt icon with torch icon on overview.md ( #9644 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 21:52:46 +08:00
Perkz Zheng
992781dc7b
[None][feat] update trtllm-gen nvfp4 kernels with better performance ( #9510 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-03 21:35:49 +08:00
Guoming Zhang
79e872de31
[None][test] Update Qwen3-next accuracy testing by setting the cuda … ( #9613 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 20:52:53 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose ( #9392 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
xinhe-nv
3a748b166b
[None][chore] Add failed cases into waives.txt ( #9593 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-03 16:26:06 +08:00
Pengyun Lin
1d4fb89235
[TRTLLM-8241][feat] Aliasing to comply to LlmArgs ( #9586 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-03 15:28:45 +08:00
fredricz-20070104
80ff9015ce
[ https://nvbugs/5561153 ][test] Fix log error for perf test ( #9622 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-03 15:27:13 +08:00
brb-nv
43f6ad7813
[ https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism ( #9647 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 15:13:59 +08:00
Bo Li
8b5ededc83
[TRTLLM-9391][chore] Automatically estimate required workspace. ( #9535 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-03 12:49:38 +08:00
Suyog Gupta
93871d52b2
[None][chore] AutoDeploy update cuda stream manager for multi-device ( #9575 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-12-02 20:43:14 -08:00
JunyiXu-nv
beffbd6002
[TRTLLM-9242][doc] Add examples showcasing openai compatible APIs ( #9520 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 11:47:02 +08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 ( #9572 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
TensorRT LLM
097ac32b28
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-03 03:19:14 +00:00
yufeiwu-nv
21f2ba74e8
[None][test] Remove duplicate test cases ( #9623 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-03 10:35:26 +08:00
Yiqing Yan
8c88454fa5
[TRTLLM-7101][infra] Reuse passed tests ( #6894 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-03 10:07:23 +08:00
Anurag Mukkara
642dfae73a
[ https://nvbugs/5698434 ][fix] Use separate weight mapper for draft ( #9607 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-12-02 16:00:22 -08:00
Enwei Zhu
a3455f55c7
[None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper ( #9612 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-03 07:55:03 +08:00
Chang Liu
3916d032ec
[None][chore] Remove traceback dump for multimodal input processor ( #9634 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-03 07:41:03 +08:00
brb-nv
55c7023c92
[None][chore] Waive test failing on pre-merge ( #9638 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 07:31:10 +08:00
Yu Chi Li
7ca38a6c0b
[ #9632 ][feat] Support EXTRA_WHEEL_BUILD_ARGS during wheel build ( #9633 )
...
Signed-off-by: Yu Chi Li <yuchil@nvidia.com>
2025-12-02 14:54:58 -08:00
Grzegorz Kwasniewski
0a7a88e74e
[TRTLLM-8946][feat] Improved heuristics to detect shardable regions ( #9200 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 22:08:19 +01:00
Patrice Castonguay
3991aa9c72
[ https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up ( #9598 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-02 12:48:53 -05:00
Neta Zmora
a560ba5546
[ #9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels ( #9551 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-03 01:39:38 +08:00
Shi Xiaowei
227d42e492
[ https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity ( #9549 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
Lucas Liebenwein
e72ce98c0f
[ #9150 ][feat] AutoDeploy: reviewer comments for #9150 ( #9527 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 12:09:10 -05:00
William Zhang
2dd3ebf037
[ #9150 ][feat] Add code for nano v3 to custom implementation in AD ( #9465 )
...
* Why?
We would like to show an alternative to monkey-patching in AutoDeploy.
* What?
This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.
Part of #9150 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-02 08:56:44 -08:00
Mike Iovine
d5b7f0c8ad
[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch ( #8889 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-02 10:32:02 -05:00
Thor Johnsen
95049eea86
[ https://nvbugs/5627710 ][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks ( #9056 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-02 09:10:21 -06:00