Commit Graph

1778 Commits

Author SHA1 Message Date
Shijie
928247a3f9
[https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943)
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
2025-10-23 15:55:10 +08:00
xinhe-nv
04e2b2752a
[None][feat] add Nemotron-Ultra multi nodes eval tests (#8577)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 02:44:26 -04:00
Suyog Gupta
2956978da3
[None][feat] Enable rms norm fusion for Nemotron MOE (#8563)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-23 00:09:42 -04:00
Lucas Liebenwein
77fa5dfee9
[https://nvbugs/5604136][fix] AutoDeploy: correct import for mxfp4_moe unit test (#8593)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-22 22:11:18 -04:00
sunnyqgg
ea3e0eea51
[TRTLLM-7954][feat] Target model KV cache rellocation (#8421)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-23 09:36:50 +08:00
Anthony Chang
8a3b870e09
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-23 09:14:18 +08:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
Leslie Fang
e5865de518
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 20:03:18 -04:00
brb-nv
00c2b81037
[None][chore] Skip failing import of mxfp4_moe (#8591)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-22 16:19:22 -04:00
Patrice Castonguay
879039f6d5
[https://nvbugs/5429636][feat] Kv transfer timeout (#8459)
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
xinhe-nv
b8b2c9efb4
[None][chore] add precommit hook to remove redundant tab and white space (#8534)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 09:21:54 -04:00
Eran Geva
910e6b9684
[None][fix] fixed cached model path in test (#8549)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-22 07:47:41 -04:00
Eran Geva
d4b3bae5af
[#8391][fix] check perf by device subtype (#8428)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-22 12:38:05 +03:00
Yan Chunwei
3f9dbc76c0
[None][fix] fix rpc unique addr related issue (#8419)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-22 04:47:18 -04:00
Ivy Zhang
912cf4f603
[TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations (#8518)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-22 04:36:47 -04:00
Emma Qiao
92e99b6545
[None][infra] Waive failed cases for main branch 10/22 (#8573)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-22 04:21:56 -04:00
Shi Xiaowei
77940635bb
[https://nvbugs/5451272][fix] unwaive the test (#8537)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 14:28:42 +08:00
xinhe-nv
187cf12d8f
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8554)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 01:26:15 -04:00
Emma Qiao
2b4e812aea
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging (#8471)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-22 00:07:35 -04:00
chenfeiz0326
6cf1c3fba4
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-22 10:17:22 +08:00
sunnyqgg
90080e0e09
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-22 09:58:22 +08:00
Leslie Fang
50d4e5bc06
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 08:33:48 +08:00
Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469) 2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Emma Qiao
653aa6b6dc
[None][infra] Waive failed tests for main 10/21 (#8524)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 06:24:15 -04:00
Yan Chunwei
9ba5959e8e
[None][fix] the api_stability unify default values of None and inspect._empty (#8496)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-21 16:57:40 +08:00
xinhe-nv
c566890624
[TRTLLM-8638][fix] Remove closed bugs (#8478)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 03:48:58 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
xinhe-nv
3264d605fb
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8486)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 01:20:29 -04:00
ruodil
ab4b9966b2
[TRTLLM-7287][test] add multimodal chunked_prefill cases (#8011)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-10-20 22:43:47 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token (#8502)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
Suyog Gupta
7050b1ea49
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
Venky
3e681e2a80
[None] [chore] Add architecture-specific ATTRIBUTIONS files (#8468)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-20 16:29:15 -04:00
Lucas Liebenwein
55c468b218
[#8461][feat] AutoDeploy: trtllm-serve bug fix + unit test (#8462)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-20 16:06:39 -04:00
dongfengy
9b289d5230
[https://nvbugs/5568676][fix] Remove test waive (#8437)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-20 12:03:50 -07:00
HuiGao-NV
d0663e16e0
[https://nvbugs/5492250][fix] Remove isolated cases and unwaive cases (#8492)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-20 07:40:07 -04:00
Pamela Peng
b818a912d7
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
QI JUN
d05079ba4b
[None][ci] move some test cases from H100 to A10 (#8449)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-20 01:58:34 -04:00
Yi Zhang
3c2b3bd4d4
[TRTLLM-7255][feat] Add iteration log parser script for benchmark log (#6942)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-10-20 01:34:52 -04:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
xiweny
f7722e2b65
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt (#8359)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-20 09:52:51 +08:00
xinhe-nv
9aa086d3bb
[None][chore] update test duration (#8377)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-19 20:45:51 -04:00
Emma Qiao
796891ba2a
[None][infra] Skip a failed case in pre-merge for main on 10/19 (#8479)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 22:19:00 +08:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
Emma Qiao
e185173240
[None][infra] Waive test for main branch on 10/18 (#8472)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 04:36:42 -04:00
brb-nv
7cc65a6296
[None][chore] Waive failing transceiver test (#8473)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-18 17:22:10 -04:00
Lucas Liebenwein
41169fb20c
[None][feat] AutoDeploy: chunked prefill support (#8158)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-18 00:47:35 -07:00