QI JUN
|
6ee1c87595
|
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-24 08:55:49 +08:00 |
|
h-guo18
|
23920223ab
|
[#4585][feat] Replace unified attention before export (#8303)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
2025-10-23 18:02:04 -04:00 |
|
Aurelien Chartier
|
32e1ad68e1
|
[None][chore] Cleanup GDS code (#8475)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-10-23 12:36:31 -07:00 |
|
QI JUN
|
cc81028547
|
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig (#8558)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-23 10:32:09 -04:00 |
|
Emma Qiao
|
ee21ea3e91
|
[None][infra] Disable rtxpro6000 stages due to nodes will be offline (#8613)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-10-23 10:24:05 -04:00 |
|
Emma Qiao
|
7c1bca4563
|
[None][infra] Fix slurm exitcode (#8585)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-10-23 09:46:00 -04:00 |
|
Robin Kobus
|
3a5845e293
|
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format (#7811)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-10-23 10:27:56 +02:00 |
|
Shijie
|
928247a3f9
|
[https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943)
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
|
2025-10-23 15:55:10 +08:00 |
|
xinhe-nv
|
04e2b2752a
|
[None][feat] add Nemotron-Ultra multi nodes eval tests (#8577)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-23 02:44:26 -04:00 |
|
Suyog Gupta
|
2956978da3
|
[None][feat] Enable rms norm fusion for Nemotron MOE (#8563)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-23 00:09:42 -04:00 |
|
dongxuy04
|
a7c2c8c212
|
[None][fix] Allow multi-threaded copy for GDRCopy wrapper (#8535)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-10-23 10:25:04 +08:00 |
|
Lucas Liebenwein
|
77fa5dfee9
|
[https://nvbugs/5604136][fix] AutoDeploy: correct import for mxfp4_moe unit test (#8593)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-22 22:11:18 -04:00 |
|
sunnyqgg
|
ea3e0eea51
|
[TRTLLM-7954][feat] Target model KV cache rellocation (#8421)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-23 09:36:50 +08:00 |
|
Anthony Chang
|
8a3b870e09
|
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-23 09:14:18 +08:00 |
|
Anish Shanbhag
|
15de45d782
|
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-22 20:53:08 -04:00 |
|
Leslie Fang
|
e5865de518
|
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 20:03:18 -04:00 |
|
brb-nv
|
00c2b81037
|
[None][chore] Skip failing import of mxfp4_moe (#8591)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-10-22 16:19:22 -04:00 |
|
dongxuy04
|
df689f8fed
|
[None][fix] Fix EPLB CPU thread NUMA binding (#8579)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-10-22 10:52:09 -04:00 |
|
Patrice Castonguay
|
879039f6d5
|
[https://nvbugs/5429636][feat] Kv transfer timeout (#8459)
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
|
2025-10-22 09:29:02 -04:00 |
|
xinhe-nv
|
b8b2c9efb4
|
[None][chore] add precommit hook to remove redundant tab and white space (#8534)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-22 09:21:54 -04:00 |
|
Eran Geva
|
910e6b9684
|
[None][fix] fixed cached model path in test (#8549)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-10-22 07:47:41 -04:00 |
|
mpikulski
|
40a9c61a89
|
[None][fix] generate nanobind stubs for submodules (#8539)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-22 06:23:08 -04:00 |
|
Yan Chunwei
|
f81caf5491
|
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 17:54:38 +08:00 |
|
Eran Geva
|
d4b3bae5af
|
[#8391][fix] check perf by device subtype (#8428)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-10-22 12:38:05 +03:00 |
|
Yan Chunwei
|
3f9dbc76c0
|
[None][fix] fix rpc unique addr related issue (#8419)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 04:47:18 -04:00 |
|
Ivy Zhang
|
912cf4f603
|
[TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations (#8518)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-10-22 04:36:47 -04:00 |
|
Emma Qiao
|
92e99b6545
|
[None][infra] Waive failed cases for main branch 10/22 (#8573)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-10-22 04:21:56 -04:00 |
|
yunruis
|
8c9fda4b85
|
[None][doc] Paragraph adjustment and fix statistic (#8568)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-10-22 03:26:09 -04:00 |
|
Yiqing Yan
|
b04e51291a
|
[None][chore] Bump version to 1.2.0rc2 (#8562)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-10-22 14:35:05 +08:00 |
|
Shi Xiaowei
|
77940635bb
|
[https://nvbugs/5451272][fix] unwaive the test (#8537)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-10-22 14:28:42 +08:00 |
|
qsang-nv
|
07edac2818
|
[None][feat] Add vLLM KV Pool support for XQA mla kernel (#8560)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
|
2025-10-22 14:12:57 +08:00 |
|
xinhe-nv
|
187cf12d8f
|
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8554)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-22 01:26:15 -04:00 |
|
Emma Qiao
|
2b4e812aea
|
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging (#8471)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-10-22 00:07:35 -04:00 |
|
chenfeiz0326
|
6cf1c3fba4
|
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-10-22 10:17:22 +08:00 |
|
Shi Xiaowei
|
50149ac2bd
|
[None][doc] Fix the incorrect doc figure (#8536)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-10-22 10:08:55 +08:00 |
|
sunnyqgg
|
90080e0e09
|
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-22 09:58:22 +08:00 |
|
Leslie Fang
|
50d4e5bc06
|
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 08:33:48 +08:00 |
|
Chenghao Zhang
|
bac9e8c2ad
|
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469)
|
2025-10-21 15:32:01 -07:00 |
|
Lizhi Zhou
|
23d5280a90
|
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-21 17:25:07 -04:00 |
|
Lucas Liebenwein
|
9b54b3bfaf
|
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-21 17:07:06 -04:00 |
|
YueWeng
|
8dc4aac5b6
|
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-10-21 11:11:04 -04:00 |
|
Shi Xiaowei
|
a0024f4d34
|
[None][doc] Facilitates the integration of the transfer agent (#7867)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-10-21 20:06:24 +08:00 |
|
Emma Qiao
|
653aa6b6dc
|
[None][infra] Waive failed tests for main 10/21 (#8524)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-10-21 06:24:15 -04:00 |
|
Yan Chunwei
|
9ba5959e8e
|
[None][fix] the api_stability unify default values of None and inspect._empty (#8496)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-21 16:57:40 +08:00 |
|
Yueh-Ting (eop) Chen
|
85088dce05
|
[None][chore] Update feature combination matrix for SWA kv cache reuse (#8529)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-10-21 04:41:44 -04:00 |
|
xinhe-nv
|
c566890624
|
[TRTLLM-8638][fix] Remove closed bugs (#8478)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-21 03:48:58 -04:00 |
|
Emma Qiao
|
c72f6d1dcc
|
[None][infra] Add split algorithm for slurm (#8516)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-10-21 02:56:22 -04:00 |
|
Pengyun Lin
|
a4227cf1b0
|
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-21 14:08:39 +08:00 |
|
QI JUN
|
0acd10e3de
|
[None][ci] rebalance H100 stages (#8491)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-21 02:03:48 -04:00 |
|
xinhe-nv
|
3264d605fb
|
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8486)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-21 01:20:29 -04:00 |
|