Cao Dong
|
dddfcdd3bf
|
[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789)
Signed-off-by: Dong Cao <docao@nvidia.com>
|
2025-11-04 19:32:59 +08:00 |
|
Zhanrui Sun
|
4de31bece2
|
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-04 18:59:34 +08:00 |
|
CarstyYou
|
4296c9553d
|
[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2025-11-04 18:10:36 +08:00 |
|
danielafrimi
|
2b58dba0f6
|
[https://nvbugs/5524714][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ (#8432)
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-04 16:42:31 +08:00 |
|
Patrice Castonguay
|
65c138108e
|
[https://nvbugs/5552889][fix] fix: Prevent empty batch when using attention DP with disagg (#8372)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-04 16:42:31 +08:00 |
|
xiweny
|
fcac2022e2
|
[https://nvbugs/5565565] [fix] fp8 wideep support sm103 (#8228)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-04 16:42:31 +08:00 |
|
Yechan Kim
|
67208f1512
|
[None][fix] InputProcessor config naming convention fix (#8705)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 22:29:21 -08:00 |
|
HuiGao-NV
|
97674c3114
|
[TRTLLM-8690][feat] add more tensors to share buffers (#8691)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-11-03 21:08:01 -08:00 |
|
Yan Chunwei
|
ed297d7c2e
|
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api (#8415)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-11-03 17:59:49 -08:00 |
|
Matthias Jouanneaux
|
d0f107e4dd
|
[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
|
2025-11-04 09:06:58 +08:00 |
|
Li Min
|
89336fbf07
|
[None][fix] Fix cute dsl nvfp4 gemm autotune issue (#8761)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-11-03 22:55:45 +08:00 |
|
Yechan Kim
|
f48968b6cc
|
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 06:01:07 -08:00 |
|
Yechan Kim
|
00c0e6c440
|
[https://nvbugs/5523315][fix] Fix serve benchmark test (#8255)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 00:30:13 -08:00 |
|
Cao Dong
|
2ff772ef71
|
[None][feat] Add benchmark to DeepConf (#8776)
Signed-off-by: Dong Cao <docao@nvidia.com>
|
2025-11-03 16:05:50 +08:00 |
|
yufeiwu-nv
|
b4d17d1a4c
|
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
|
2025-11-03 13:34:06 +08:00 |
|
Chang Liu
|
f57dc01e6f
|
[https://nvbugs/5625380][chore] Remove multimodal related fields from decoder llm input (#8846)
|
2025-11-02 17:44:08 -08:00 |
|
Eran Geva
|
f8778230e3
|
[#8781][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang (#8803)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-11-02 15:30:39 +02:00 |
|
Yan Chunwei
|
1551ed8e5f
|
[https://nvbugs/5437384][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-11-01 06:49:33 -07:00 |
|
Bo Li
|
4c5a8f4ec6
|
[None][fix] Rename: slot_count -> invalid_expert_id (#8783)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-11-01 21:36:59 +08:00 |
|
QI JUN
|
89e0117097
|
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-01 05:26:06 -07:00 |
|
Fanrong Li
|
f0dc746738
|
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-31 14:38:31 -07:00 |
|
Suyog Gupta
|
3d0e38e074
|
[None][perf] AutoDeploy optimize _get_unique_value (#8822)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-31 04:57:10 -07:00 |
|
Anthony Chang
|
852e5060aa
|
[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json (#8617)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-31 04:41:44 -07:00 |
|
Yukun He
|
1d4a186ace
|
[https://nvbugs/5623960][fix] Compress the warning log of AutoTuner when encountering tactic failures. (#8793)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-10-31 11:09:14 +08:00 |
|
Yuxian Qiu
|
025d2926df
|
[https://nvbugs/5599515][fix] Fix PP bubbles. (#8687)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-10-31 10:13:56 +08:00 |
|
Yilin Fan
|
f3224ccd32
|
[None][feat] Add disagg relay time to time breakdown tool (#8465)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
|
2025-10-30 18:21:45 -07:00 |
|
Chenghao Zhang
|
71c5576a44
|
[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE (#8737)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-30 12:33:08 -07:00 |
|
Tailing Yuan
|
ec31363a86
|
[None][fix] Layer wise benchmarks: use local models, lint (#8799)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-10-30 09:47:46 -07:00 |
|
Tailing Yuan
|
f9c7786dc8
|
[None][feat] Add layer wise benchmarks (#8777)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-10-30 20:29:34 +08:00 |
|
Anthony Chang
|
f666ad2f6b
|
[None][feat] Autotuner can iterate through all tactics for test purposes (#8663)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-30 13:11:25 +01:00 |
|
WeiHaocheng
|
cc286687c4
|
[None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-10-30 16:02:40 +08:00 |
|
Void
|
6b755fd9f8
|
[None][fix] fix runtime error that bf16 input is not quantized to nvfp4 when use bf16 dispatch (#8507)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-10-30 15:06:54 +08:00 |
|
Yi Zhang
|
496b419791
|
[None][doc] Add doc for torch.compile & piecewise cuda graph (#8527)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
|
2025-10-29 21:15:46 -07:00 |
|
Simeng Liu
|
834a780655
|
[https://nvbugs/5599086][fix] Fix FP8 Linear module for spark (#8707)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-10-29 13:58:19 -07:00 |
|
Iman Tabrizian
|
ae6875fe10
|
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-10-29 08:04:26 -07:00 |
|
Leslie Fang
|
451959c60d
|
[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend (#8717)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-29 20:37:14 +08:00 |
|
Fanrong Li
|
a21697ead9
|
[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench (#8729)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-10-29 05:17:16 -07:00 |
|
kris1025
|
e2c5a38879
|
[https://nvbugs/5534574][fix] disable spec decoding forever once the request spec decoding is disabled (#8446)
Signed-off-by: linquanh <linquanh@nvidia.com>
|
2025-10-29 19:28:43 +08:00 |
|
Yi Zhang
|
a69bd2a6fa
|
[https://nvbugs/5550409][fix] Disable torch compile in piecewise attention part to Avoid host overhead (#8708)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
|
2025-10-29 18:12:58 +08:00 |
|
Pengyun Lin
|
2aade46d18
|
[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-29 15:48:29 +08:00 |
|
Chang Liu
|
5f737b8dbe
|
[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-10-29 12:45:09 +08:00 |
|
Cheng Hang
|
15c293a90b
|
[None][feat] Enable nvfp4 cuda core for sm120 (#8620)
Signed-off-by: Cheng Hang <chang@nvidia.com>
|
2025-10-29 12:39:03 +08:00 |
|
Yechan Kim
|
bc26f4ce7c
|
[https://nvbugs/5549829][fix] Qwen2.5-VL TP > 1 + Quantized weight load fix (#8680)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-10-29 13:38:42 +09:00 |
|
Yechan Kim
|
cf8a1d2ef9
|
[https://nvbugs/5596377][fix] Fix mm dummy calculation (#8498)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-10-29 09:45:21 +09:00 |
|
Lizhi Zhou
|
24167d00eb
|
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-28 17:04:53 -07:00 |
|
Kaiyu Xie
|
227c288441
|
[TRTLLM-8827] [feat] Enable low precision alltoall for Cutlass and TRTLLMGen backends (#8675)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-10-29 07:56:48 +08:00 |
|
Mike Iovine
|
00161b315f
|
[https://nvbugs/5549111][fix] Fix 2-model overlap scheduler accuracy on very long prompts (#8076)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Michael Iovine <miovine@nvidia.com>
|
2025-10-28 14:55:34 -07:00 |
|
Lucas Liebenwein
|
0ee71d95ec
|
[https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup (#8658)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-28 10:52:43 -07:00 |
|
Anish Shanbhag
|
a09b38a862
|
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-28 09:17:26 -07:00 |
|
William Zhang
|
cdc9e5e645
|
[None][fix] Properly raise error for nemotron H models (#8697)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-10-28 08:59:42 -07:00 |
|