Daniel Cámpora
|
205c97a4ae
|
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-06-25 17:41:36 +02:00 |
|
Kaiyu Xie
|
c5ae3272b9
|
feat: Make benchmark_serving part of the library (#5428)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-25 23:13:56 +08:00 |
|
HuiGao-NV
|
314f15f0a7
|
Fix: fix nvbug 5356427 (#5464)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 22:24:26 +08:00 |
|
HuiGao-NV
|
cc3c2b3be2
|
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 21:38:14 +08:00 |
|
Kaiyu Xie
|
d6ada5ffce
|
[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-25 20:37:24 +08:00 |
|
HuiGao-NV
|
b3a4c1f404
|
feat: Remove not used padding_idx in models (#5385)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 17:19:59 +08:00 |
|
QI JUN
|
2901c5a5bc
|
CI: waive test_ad_build_small_multi (#5471)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 16:44:42 +08:00 |
|
Perkz Zheng
|
1f292ff2a0
|
[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels (#5426)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-06-25 16:31:10 +08:00 |
|
Yiqing Yan
|
f3cfe86dd1
|
chore: bump version to 1.0.0rc1 (#5460)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-25 16:21:34 +08:00 |
|
Netanel Haber
|
3ca2f6ac51
|
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-06-25 15:52:06 +08:00 |
|
QI JUN
|
478f668dcc
|
CI: update multi gpu test triggering file list (#5466)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 15:51:02 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
Enwei Zhu
|
76da7fed86
|
fix (NvBug 5354925): Fix static EPLB (#5411)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 13:14:40 +08:00 |
|
HuiGao-NV
|
da98e03747
|
tests: Set kv cache free memory fraction in test case (#5433)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 12:31:58 +08:00 |
|
Shunkangz
|
d5354897c0
|
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-25 09:50:04 +08:00 |
|
Lucas Liebenwein
|
5cffb7e0ec
|
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-25 09:30:13 +08:00 |
|
bhsueh_NV
|
73ba4fc320
|
fix: fix bug of qwen3 + eagle3 + finalize_moe_fusion (#5369)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-06-25 09:20:23 +08:00 |
|
QI JUN
|
241f921800
|
waive test_moe.py::test_moe_fp8[autotune] (#5455)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 09:14:44 +08:00 |
|
dongxuy04
|
699520082b
|
Add MTP support for Online EPLB (#5213)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 07:58:13 +08:00 |
|
Iman Tabrizian
|
846bbf1edc
|
Fix test Pytorch model engine (#5416)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-24 11:09:27 -07:00 |
|
QI JUN
|
d93a5e04b5
|
Chore: remove unused variables (#5314)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-24 22:27:32 +08:00 |
|
HuiGao-NV
|
35a92f6bab
|
Add debug hook to support dump tensor data and add new debug functions easily (#5182)
Signed-off-by: Hui Gao
|
2025-06-24 17:45:28 +08:00 |
|
Emma Qiao
|
475272046a
|
[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-24 17:19:31 +08:00 |
|
Luis Vega
|
d26040e5d9
|
chore: delete mamba hybrid, since it is now called NemotronH (#5409)
Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
|
2025-06-24 16:27:31 +08:00 |
|
xinhe-nv
|
658fb5b54e
|
tests: update benchmark test lists (#5365)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-24 15:23:38 +08:00 |
|
Robin Kobus
|
e2a8cbc80b
|
refactor: manage cache indirection in decoder state (#5315)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-24 09:15:59 +02:00 |
|
xinhe-nv
|
4b32a3f1a7
|
test: [CI] remove closed bugs (#5400)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-24 13:39:57 +08:00 |
|
HuiGao-NV
|
e16c1bef6e
|
[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation (#5343)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-24 11:43:43 +08:00 |
|
Netanel Haber
|
58a8a8fd37
|
feature: unify new_tokens format sample state to trtllm sampler new_tokens format (#4401)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-06-23 10:38:37 -07:00 |
|
Fanrong Li
|
ebadc13086
|
[doc] update mtp documents (#5387)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-21 16:05:52 +08:00 |
|
Robin Kobus
|
b3045c44b9
|
refactor: remove TrtGptModelOptionalParams (#5165)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-20 10:31:40 +02:00 |
|
dongxuy04
|
4f0f17ac8a
|
feat: Misc Opt for large scale EP (#5374)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-20 13:11:31 +08:00 |
|
Fanrong Li
|
5d4ab47d5b
|
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-20 05:23:39 +08:00 |
|
Adamz-nvidia
|
b1878eabeb
|
Add Wechat_Group_QR_Code.png to docs/source/media and main page of TR… (#5142)
Signed-off-by: AdamZ
|
2025-06-20 03:28:00 +08:00 |
|
Yan Chunwei
|
9bd42ecf9b
|
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-20 03:01:10 +08:00 |
|
Kaiyu Xie
|
113f6fbadd
|
Fix: missing clientId when serialize and deserialize response (#5231)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-19 23:05:11 +08:00 |
|
Kaiyu Xie
|
7246fd75d1
|
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-19 21:57:10 +08:00 |
|
Shi Xiaowei
|
1e35be5840
|
doc: subsequent modifications of blog 5 (#5366)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-19 18:23:13 +08:00 |
|
Fanrong Li
|
c7af650d5a
|
Fix: fix the deterministic issue in the MTP Eagle path (#5285)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-19 18:08:40 +08:00 |
|
Shi Xiaowei
|
9a53e58a58
|
blog: Disaggregated Serving in TensorRT-LLM (#5353)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-19 18:02:15 +08:00 |
|
Frank
|
68687a9f56
|
[WAR][nvbug/5321947] Add an async sleep to unblock event loop. (#5342)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-06-19 17:25:18 +08:00 |
|
Enwei Zhu
|
bca758fce1
|
fix: Fix DS-R1 nvfp4 test case naming (#5361)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-19 15:50:43 +08:00 |
|
Emma Qiao
|
493f268b1c
|
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-19 15:05:57 +08:00 |
|
hlu1
|
b558232ce1
|
Refactor CutlassFusedMoE (#5344)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-06-19 00:04:07 -07:00 |
|
ruodil
|
e22e884b02
|
test: amend test case name in perf cluster test (#5356)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-19 14:50:12 +08:00 |
|
ruodil
|
21ce9b6749
|
test: add qwen3 cases (#5302)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-19 14:38:36 +08:00 |
|
amitz-nv
|
1753202b61
|
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-06-19 09:12:00 +03:00 |
|
Emma Qiao
|
7f68de3e3f
|
Refactor test timeout for individual long case (#4757)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-19 13:52:11 +08:00 |
|
yunruis
|
b3e886074e
|
Fix CI build time increase (#5337)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-06-19 13:49:42 +08:00 |
|
bhsueh_NV
|
dce8620013
|
chore: enable moe_backend on Qwen3 test (#5230)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-06-19 13:40:45 +08:00 |
|