Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy ( #8469 )
2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling ( #8215 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype ( #8510 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Shi Xiaowei
a0024f4d34
[None][doc] Facilitates the integration of the transfer agent ( #7867 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-21 20:06:24 +08:00
Emma Qiao
653aa6b6dc
[None][infra] Waive failed tests for main 10/21 ( #8524 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 06:24:15 -04:00
Yan Chunwei
9ba5959e8e
[None][fix] the api_stability unify default values of None and inspect._empty ( #8496 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-21 16:57:40 +08:00
Yueh-Ting (eop) Chen
85088dce05
[None][chore] Update feature combination matrix for SWA kv cache reuse ( #8529 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-21 04:41:44 -04:00
xinhe-nv
c566890624
[TRTLLM-8638][fix] Remove closed bugs ( #8478 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 03:48:58 -04:00
Emma Qiao
c72f6d1dcc
[None][infra] Add split algorithm for slurm ( #8516 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 02:56:22 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser ( #8000 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
QI JUN
0acd10e3de
[None][ci] rebalance H100 stages ( #8491 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-21 02:03:48 -04:00
xinhe-nv
3264d605fb
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8486 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 01:20:29 -04:00
Bo Li
ebb62e17d8
[None][feat] Add alltoall to trtllm-gen MoE backend. ( #8481 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-10-21 12:42:54 +08:00
ruodil
ab4b9966b2
[TRTLLM-7287][test] add multimodal chunked_prefill cases ( #8011 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-10-20 22:43:47 -04:00
Zero Zeng
4545700fcf
[None][chore] Move submit.sh to python and use yaml configuration ( #8003 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-10-20 22:36:50 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token ( #8502 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
Yechan Kim
85d5aa7763
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model ( #7789 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-21 11:11:24 +09:00
Ruoqian Guo
984d4fe0fe
[None][feat] Update 3rdparty/DeepGEMM to latest commit ( #8488 )
...
Signed-off-by: Ruoqian Guo <22525902+ruoqianguo@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-10-21 06:56:50 +08:00
Suyog Gupta
7050b1ea49
[ #8272 ][feat] Enable chunked prefill for SSMs in AutoDeploy ( #8477 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
Venky
3e681e2a80
[None] [chore] Add architecture-specific ATTRIBUTIONS files ( #8468 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-20 16:29:15 -04:00
Lucas Liebenwein
55c468b218
[ #8461 ][feat] AutoDeploy: trtllm-serve bug fix + unit test ( #8462 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-20 16:06:39 -04:00
dongfengy
9b289d5230
[ https://nvbugs/5568676 ][fix] Remove test waive ( #8437 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-20 12:03:50 -07:00
yuanjingx87
1e3e1474c6
[TRTLLM-6055][infra] Slurm Test refactor ( #7176 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-20 09:46:44 -07:00
HuiGao-NV
d0663e16e0
[ https://nvbugs/5492250 ][fix] Remove isolated cases and unwaive cases ( #8492 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-20 07:40:07 -04:00
Pamela Peng
b818a912d7
[ https://nvbugs/5540752 ][fix] Support quantized Phi4 MM models ( #8190 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
Robin Kobus
18c7a520b3
[None][feat] Update devcontainer configuration to include additional extensions ( #8369 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-20 12:29:15 +02:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements ( #8398 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
QI JUN
d05079ba4b
[None][ci] move some test cases from H100 to A10 ( #8449 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-20 01:58:34 -04:00
Yi Zhang
3c2b3bd4d4
[TRTLLM-7255][feat] Add iteration log parser script for benchmark log ( #6942 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-10-20 01:34:52 -04:00
Zhanrui Sun
8124a62b74
[TRTLLM-8669][infra] Use artifactory mirror for install python ( #8394 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-20 00:17:10 -04:00
Yuxian Qiu
ec32711b1e
[ https://nvbugs/5542862 ][fix] Upgrade fmha_v2. ( #8364 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-20 10:20:23 +08:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) ( #7761 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
xiweny
f7722e2b65
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt ( #8359 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-20 09:52:51 +08:00
Yueh-Ting (eop) Chen
128a351bdc
[None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager ( #8219 )
...
For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control
and cap the maximum memory of a block pool because block pool size are
not identical amongst different window sizes. This MR omits the effect
of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the
setting of block pool size to rely on the window size to share ratio
and the total gpu memory analyzed and fed to the kv cache manager.
Only skipping for VSWA scheme, no extra coverage was added.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-20 10:48:40 +09:00
xinhe-nv
9aa086d3bb
[None][chore] update test duration ( #8377 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-19 20:45:51 -04:00
Emma Qiao
796891ba2a
[None][infra] Skip a failed case in pre-merge for main on 10/19 ( #8479 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 22:19:00 +08:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend ( #7926 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
Emma Qiao
e185173240
[None][infra] Waive test for main branch on 10/18 ( #8472 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 04:36:42 -04:00
jthomson04
852316886e
[None][fix] Fix KV event consumption ( #6346 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
brb-nv
7cc65a6296
[None][chore] Waive failing transceiver test ( #8473 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-18 17:22:10 -04:00
Lucas Liebenwein
41169fb20c
[None][feat] AutoDeploy: chunked prefill support ( #8158 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-18 00:47:35 -07:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API ( #8412 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
Wanli Jiang
58b43a6dab
[None][fix] Fix get_num_tokens_per_image for nano-v2-vlm ( #8425 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-18 08:51:35 +08:00
Kyle McGill
136e0e6882
[None][feat] Enable CUDA graph support for KvConnectorWorker API ( #8275 )
...
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
2025-10-17 18:09:03 -04:00
Anish Shanbhag
5ff4f88be6
[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic ( #8277 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-17 16:13:22 -04:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs ( #8039 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
Grzegorz Kwasniewski
bb7fdcebf4
[TRTLLM-8201][feat] Topological graph helpers ( #8457 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-10-17 12:34:19 -04:00
Venky
8d07580c95
[None] [chore] Add ATTRIBUTIONS-{CPP,Python}.md + Update in wheels setup ( #8438 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-17 06:33:05 -07:00
Wanli Jiang
56f697be2e
[None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM ( #8392 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-17 19:42:47 +08:00