Commit Graph

3584 Commits

Author SHA1 Message Date
Yiqing Yan
b04e51291a
[None][chore] Bump version to 1.2.0rc2 (#8562)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-22 14:35:05 +08:00
Shi Xiaowei
77940635bb
[https://nvbugs/5451272][fix] unwaive the test (#8537)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 14:28:42 +08:00
qsang-nv
07edac2818
[None][feat] Add vLLM KV Pool support for XQA mla kernel (#8560)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-10-22 14:12:57 +08:00
xinhe-nv
187cf12d8f
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8554)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 01:26:15 -04:00
Emma Qiao
2b4e812aea
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging (#8471)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-22 00:07:35 -04:00
chenfeiz0326
6cf1c3fba4
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-22 10:17:22 +08:00
Shi Xiaowei
50149ac2bd
[None][doc] Fix the incorrect doc figure (#8536)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 10:08:55 +08:00
sunnyqgg
90080e0e09
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-22 09:58:22 +08:00
Leslie Fang
50d4e5bc06
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 08:33:48 +08:00
Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469) 2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Shi Xiaowei
a0024f4d34
[None][doc] Facilitates the integration of the transfer agent (#7867)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-21 20:06:24 +08:00
Emma Qiao
653aa6b6dc
[None][infra] Waive failed tests for main 10/21 (#8524)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 06:24:15 -04:00
Yan Chunwei
9ba5959e8e
[None][fix] the api_stability unify default values of None and inspect._empty (#8496)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-21 16:57:40 +08:00
Yueh-Ting (eop) Chen
85088dce05
[None][chore] Update feature combination matrix for SWA kv cache reuse (#8529)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-21 04:41:44 -04:00
xinhe-nv
c566890624
[TRTLLM-8638][fix] Remove closed bugs (#8478)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 03:48:58 -04:00
Emma Qiao
c72f6d1dcc
[None][infra] Add split algorithm for slurm (#8516)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 02:56:22 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
QI JUN
0acd10e3de
[None][ci] rebalance H100 stages (#8491)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-21 02:03:48 -04:00
xinhe-nv
3264d605fb
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8486)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 01:20:29 -04:00
Bo Li
ebb62e17d8
[None][feat] Add alltoall to trtllm-gen MoE backend. (#8481)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-10-21 12:42:54 +08:00
ruodil
ab4b9966b2
[TRTLLM-7287][test] add multimodal chunked_prefill cases (#8011)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-10-20 22:43:47 -04:00
Zero Zeng
4545700fcf
[None][chore] Move submit.sh to python and use yaml configuration (#8003)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-10-20 22:36:50 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token (#8502)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
Yechan Kim
85d5aa7763
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (#7789)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-21 11:11:24 +09:00
Ruoqian Guo
984d4fe0fe
[None][feat] Update 3rdparty/DeepGEMM to latest commit (#8488)
Signed-off-by: Ruoqian Guo <22525902+ruoqianguo@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-10-21 06:56:50 +08:00
Suyog Gupta
7050b1ea49
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
Venky
3e681e2a80
[None] [chore] Add architecture-specific ATTRIBUTIONS files (#8468)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-20 16:29:15 -04:00
Lucas Liebenwein
55c468b218
[#8461][feat] AutoDeploy: trtllm-serve bug fix + unit test (#8462)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-20 16:06:39 -04:00
dongfengy
9b289d5230
[https://nvbugs/5568676][fix] Remove test waive (#8437)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-20 12:03:50 -07:00
yuanjingx87
1e3e1474c6
[TRTLLM-6055][infra] Slurm Test refactor (#7176)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-20 09:46:44 -07:00
HuiGao-NV
d0663e16e0
[https://nvbugs/5492250][fix] Remove isolated cases and unwaive cases (#8492)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-20 07:40:07 -04:00
Pamela Peng
b818a912d7
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
Robin Kobus
18c7a520b3
[None][feat] Update devcontainer configuration to include additional extensions (#8369)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-20 12:29:15 +02:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
QI JUN
d05079ba4b
[None][ci] move some test cases from H100 to A10 (#8449)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-20 01:58:34 -04:00
Yi Zhang
3c2b3bd4d4
[TRTLLM-7255][feat] Add iteration log parser script for benchmark log (#6942)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-10-20 01:34:52 -04:00
Zhanrui Sun
8124a62b74
[TRTLLM-8669][infra] Use artifactory mirror for install python (#8394)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-20 00:17:10 -04:00
Yuxian Qiu
ec32711b1e
[https://nvbugs/5542862][fix] Upgrade fmha_v2. (#8364)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-20 10:20:23 +08:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
xiweny
f7722e2b65
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt (#8359)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-20 09:52:51 +08:00
Yueh-Ting (eop) Chen
128a351bdc
[None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager (#8219)
For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control
and cap the maximum memory of a block pool because block pool size are
not identical amongst different window sizes. This MR omits the effect
of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the
setting of block pool size to rely on the window size to share ratio
and the total gpu memory analyzed and fed to the kv cache manager.

Only skipping for VSWA scheme, no extra coverage was added.

Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-20 10:48:40 +09:00
xinhe-nv
9aa086d3bb
[None][chore] update test duration (#8377)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-19 20:45:51 -04:00
Emma Qiao
796891ba2a
[None][infra] Skip a failed case in pre-merge for main on 10/19 (#8479)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 22:19:00 +08:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
Emma Qiao
e185173240
[None][infra] Waive test for main branch on 10/18 (#8472)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 04:36:42 -04:00
jthomson04
852316886e
[None][fix] Fix KV event consumption (#6346)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
brb-nv
7cc65a6296
[None][chore] Waive failing transceiver test (#8473)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-18 17:22:10 -04:00