Kaiyu Xie
|
dce1dcc4f9
|
feat: Support post_proc for bench (#5122)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-15 13:02:38 +08:00 |
|
Enwei Zhu
|
63bc62ddf4
|
feat: Enable EPLB to existing MoE models (#5203)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-15 11:48:06 +08:00 |
|
Yuan Tong
|
6bce7337a9
|
perf: avoid dynamic import overhead in is_llm_response with duck typing (#5110)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-06-15 07:45:02 +08:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
Aurelien Chartier
|
1389f5a4d3
|
feat: Add support for fp8 rowwise quantization (#4876)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
|
2025-06-14 06:37:48 -07:00 |
|
2ez4bz
|
dc52b67492
|
linting(python): Enable ruff on more files (wave 1/N) (#5140)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-06-14 19:19:34 +08:00 |
|
Tailing Yuan
|
0b60da2c45
|
feat: large-scale EP(part 7: DeepEP integration) (#4792)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-14 19:12:38 +08:00 |
|
Robin Kobus
|
443b2eb51f
|
refactor: Speculative decoding buffers (#5091)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-14 11:39:32 +02:00 |
|
yunruis
|
b99c5ce8c1
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
Signed-off-by: yunruis <yunruis@nvidia.com>
Signed-off-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
Signed-off-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
Co-authored-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-06-14 17:36:22 +08:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
dongxuy04
|
97657bfda2
|
optimize memset before alltoall communication (#5188)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-14 10:49:47 +08:00 |
|
Aurelien Chartier
|
82e280f6f3
|
feat: add multi-node support for Triton with pytorch backend (#5172)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-13 13:27:58 -07:00 |
|
Enwei Zhu
|
5f2785fb90
|
fix: Fix waive list (#5205)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-13 23:33:23 +08:00 |
|
Yilin Fan
|
06342ffb4d
|
[feat] Implement model-agnostic one-engine eagle3 (#4778)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-13 08:11:41 -07:00 |
|
Mike Iovine
|
25aa3881d7
|
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len (#4879)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-13 11:06:36 -04:00 |
|
Perkz Zheng
|
3d87770e15
|
[https://nvbugspro.nvidia.com/bug/5295470] support headDim 256 for blackwell fmha kernels (#5164)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-06-13 23:01:01 +08:00 |
|
QI JUN
|
952f33dcad
|
CI: move all test cases of TensorRT backend into post merge (#5186)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-13 20:48:48 +08:00 |
|
Chuang Zhu
|
8e9937081d
|
ucxx only use ucp_feature_tag to aviod some issuse on some platform (#4994)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-06-13 19:14:25 +08:00 |
|
yunruis
|
e5be3a95b3
|
fix: fix license bug (#5200)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-06-13 18:58:15 +08:00 |
|
yunruis
|
e96d6863d8
|
add doc for open-sourced cutlass kernels (#5194)
Signed-off-by: yunruis
|
2025-06-13 18:51:27 +08:00 |
|
brb-nv
|
089be8912a
|
feat: Basic skeleton for Gemma3 VLM (#5108)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-06-13 17:27:04 +08:00 |
|
xinhe-nv
|
30d9d0fa71
|
test: [CI] Add failed cases into waives.txt (#5178)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 16:38:51 +08:00 |
|
nv-guomingz
|
b959618579
|
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field from PytorchConfig (#5031)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-13 16:34:24 +08:00 |
|
yunruis
|
30c5b4183a
|
refactoring: port customized kernels with public cutlass version (#5027)
Signed-off-by: yunruis
Merge this to unblock others since the full CI has been run through
|
2025-06-13 16:19:31 +08:00 |
|
Yao Yao
|
12e075eb70
|
[nvbug 5333996 ][fix] Unload XQA cubins early to avoid static lifetime (#5133)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
|
2025-06-13 15:53:29 +08:00 |
|
Matthias Jouanneaux
|
514baf1287
|
[fix] Fix comment to pass guardwords check (#5191)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
|
2025-06-13 15:49:59 +08:00 |
|
Zheng Duan
|
4d0a5ad384
|
chore: gracefully exit disagg process in tests; better startup and logging (#5109)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
|
2025-06-13 14:03:55 +08:00 |
|
Ivy Zhang
|
28cd536bd6
|
[test] Update timeout params in QA test list (#5124)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-13 13:40:03 +08:00 |
|
Iman Tabrizian
|
01bd4c00b4
|
Add two MTP disaggregated test (#4546)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-13 12:17:45 +08:00 |
|
Daniel Cámpora
|
dec326ba7d
|
[fix] Reenable test return logits (#5160)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-06-13 06:07:22 +02:00 |
|
Yibin Li
|
b79eb34bfe
|
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-13 11:37:50 +08:00 |
|
xinhe-nv
|
d9be419f45
|
tests: update tests for b200 (#5180)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 11:25:33 +08:00 |
|
ruodil
|
fa582cbe9a
|
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-13 11:09:15 +08:00 |
|
zhhuang-nv
|
a891013e3c
|
[feat] Optimize KV Cache Reuse for MLA (#4869)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
|
2025-06-13 11:03:05 +08:00 |
|
Yuxian Qiu
|
4ae46b6714
|
fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor. (#4930)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-13 10:21:32 +08:00 |
|
Fanrong Li
|
38a907aaca
|
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance (#5119)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-13 08:58:44 +08:00 |
|
Matthias Jouanneaux
|
a0b6c635b1
|
[feat] trtllmGen MoE routing: added support for top groups and top K bounds (#4063)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-06-13 06:00:02 +08:00 |
|
Xiaodong (Vincent) Huang
|
cc2a1344be
|
None: fix OOM because of unnecessary mha workspace (#5056)
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
|
2025-06-12 21:56:05 +02:00 |
|
pcastonguay
|
3a04c9fa7b
|
chore: Include prompt_token_ids only for context-only disagg requests (#5055)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-06-12 15:00:08 -04:00 |
|
Omer Ullman Argov
|
655bce0b19
|
[fix][test] report individual unittests results to jenkins (#5116)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-13 01:52:09 +08:00 |
|
Mike Iovine
|
690873ba1a
|
[nvbug/5334370][fix] Fix one model EAGLE3 (#5134)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-12 10:28:14 -04:00 |
|
HuiGao-NV
|
dfeeaf6746
|
Move allreduce_strategy from committed api to reference (#5147)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-12 21:00:20 +08:00 |
|
brb-nv
|
8cfb567182
|
fix: Updates to yarn implementation (#5105)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-06-12 20:45:34 +08:00 |
|
nv-guomingz
|
cf35a079f9
|
fix:https://nvbugs/5298661 (#5022)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-12 20:41:44 +08:00 |
|
nv-guomingz
|
58d4ca2385
|
fix:remove duplicated trust_remote_code knob from trtllm-serve (#5143)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-12 19:48:24 +08:00 |
|
Daniel Cámpora
|
22281cfc55
|
doc: Added documentation for enable_trtllm_sampler. (#4990)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
|
2025-06-12 18:34:15 +08:00 |
|
Venky
|
59c9588e9a
|
enh(doc): Add ci-overview in docs/source/reference/ (#5137)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-06-12 17:48:13 +08:00 |
|
Shi Xiaowei
|
88cba5f354
|
test: waive the NIXL related tests (#5153)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-12 17:02:27 +08:00 |
|
nv-guomingz
|
b563696dee
|
doc:fix invalid links for trtllm-serve doc (#5145)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-12 16:17:32 +08:00 |
|
Zhanrui Sun
|
a97f4581d2
|
infra: upload imageTag info to artifactory and add ngc_staging to save ngc image (#4764)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-06-12 15:38:47 +08:00 |
|