danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Martin Marciniszyn Mehringer
|
943fd418dd
|
fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings (#6189)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-07-20 10:38:51 +08:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|
Void
|
118307c224
|
DeepEP LL support variable hidden size and tokens num (#6141)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-20 09:32:41 +08:00 |
|
Pengyun Lin
|
69e9f6d489
|
[fix]: Skip prompt length checking for generation only requests (#6146)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-19 21:26:37 +08:00 |
|
Ziyi Xiong
|
66030ef815
|
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-19 13:17:15 +08:00 |
|
wili
|
82d3587bb8
|
[refactor] Unify name of NGram speculative decoding (#5937)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-19 12:59:57 +08:00 |
|
Rashid Kaleem
|
152e2df43b
|
[Disaggregated] Add retry knobs and handling (#5808)
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-07-19 07:27:59 +08:00 |
|
John Calderon
|
fc8b29c4ff
|
[Issue 5927][fix] Avoid memory calls during broadcast for single GPU (#6010)
Signed-off-by: John Calderon <johncalesp@gmail.com>
|
2025-07-18 14:21:03 -07:00 |
|
Bo Deng
|
0388ff9083
|
[https://nvbugs/5393961][fix] record kv-cache size in MLACacheFormatter (#6181)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-19 05:06:45 +08:00 |
|
Netanel Haber
|
d9a3530048
|
[nvbug/5393888][nvbug/5393042] Always use py_seq_slot (#6147)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-07-18 22:45:16 +03:00 |
|
Stefan Niebler
|
d475c97c82
|
[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound (#5926)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-19 01:54:51 +08:00 |
|
Stefan Niebler
|
6d7874a467
|
[nvbugs/5369799] fix: Update disaggregation handling in sampler (#5762)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-19 01:40:46 +08:00 |
|
xiaoqi
|
28858c8711
|
feat(eagle3):support qwen3 dense model (#5879)
Signed-off-by: xq25478 <xq25478@qq.com>
|
2025-07-19 01:24:32 +08:00 |
|
Venky
|
22d4a8c48a
|
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-19 00:50:40 +08:00 |
|
Bo Deng
|
2c6fa145ee
|
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-19 00:48:44 +08:00 |
|
Bo Li
|
07e8813984
|
feat: Remove padding in attention DP. (#6064)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-18 23:30:34 +08:00 |
|
Stefan Niebler
|
fd6ce7f20e
|
[ci] Speedup beam search unit tests with fixtures for LLM (#5843)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-18 22:54:49 +08:00 |
|
Zhanrui Sun
|
8454640ee1
|
infra: fix single-GPU stage failed will not raise error (#6165)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-18 22:39:32 +08:00 |
|
Erin
|
9522cde464
|
fix: NVBug 5385576 py_batch_idx issue (#6153)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-18 22:36:43 +08:00 |
|
Leslie Fang
|
44040edbf0
|
update broken link of PyTorchModelEngine in arch_overview (#6171)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-07-18 19:53:38 +08:00 |
|
Robin Kobus
|
ec2b953e7e
|
refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-18 12:12:08 +02:00 |
|
Emma Qiao
|
77acb4f753
|
[Infra] - Waive failed tests in post-merge (#6176)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-18 17:34:34 +08:00 |
|
QI JUN
|
a95f31e72a
|
chore: add more log in FmhaDispatcher (#6170)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-18 16:53:02 +08:00 |
|
Yiteng Niu
|
519a2116b5
|
[None][infra] Update the allow list of CI trigger (#6168)
Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
Co-authored-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-07-18 15:38:38 +08:00 |
|
Yiqing Yan
|
f32169269a
|
[TRTLLM-5179] - Update bot help messages (#5277)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-18 15:25:05 +08:00 |
|
Chuang Zhu
|
c0e416535e
|
fix single_disagg_test (#6166)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-18 13:18:37 +08:00 |
|
Aurelien Chartier
|
812243bdd6
|
feat: add support for Modelopt fp8_pb_wo quantization scheme (#6106)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
|
2025-07-18 10:35:12 +08:00 |
|
Zhenhuan Chen
|
992b273045
|
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-18 10:34:37 +08:00 |
|
xavier-nvidia
|
200ea9ee81
|
fix TMA error with GEMM+AR on TP=2 (#6075)
Signed-off-by: Xavier Simmons <xsimmons@nvidia.com>
|
2025-07-18 10:26:08 +08:00 |
|
yifeizhang-c
|
0155e7a3a1
|
[TRTLLM-6368] Update deepep dispatch API (#6037)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
|
2025-07-18 10:13:31 +08:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
Daniel Stokes
|
ae28b3a664
|
feat: Add support for benchmarking individual gemms in MOE benchmark (#6080)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
|
2025-07-18 09:00:12 +12:00 |
|
qixiang-99
|
2c90203c36
|
Refactor KVCacheManager: Simplify token availability calculation and … (#6134)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-07-17 13:33:33 -07:00 |
|
Frank
|
161490f039
|
[fix] Fixes KV Cache overrides in trtllm-bench (#6103)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-07-18 03:44:44 +08:00 |
|
2ez4bz
|
8480c120b1
|
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-17 11:04:17 -07:00 |
|
Iman Tabrizian
|
10dbf4f0f4
|
[fix] Remove duplicated KVCache transmission check (#6022)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-17 12:02:19 -04:00 |
|
ixlmar
|
d71c6fe526
|
[fix] Update jenkins container images (#6094)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-17 16:22:25 +01:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
Ziyi Xiong
|
58d22a72f1
|
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter (#6007)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
|
2025-07-17 21:15:01 +08:00 |
|
Stanley Sun
|
9518e14f69
|
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-07-17 20:55:04 +10:00 |
|
Yi Zhang
|
a718486900
|
fix: Fix DeepSeek R1 CI (#6129)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-17 18:24:49 +08:00 |
|
nv-guomingz
|
9b45499caa
|
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-17 18:05:45 +08:00 |
|
Erin
|
de60ae47e3
|
chores: unwaive a few tests for v1.0 (#6107)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-17 17:59:51 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
Emma Qiao
|
1cc49494fe
|
[Infra] - Add wiave list for pytest when using slurm (#6130)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-17 16:53:15 +08:00 |
|
Zhenhuan Chen
|
8c1c9ef7aa
|
fix: convert venv_prefix to str before comparison with base_prefix (#6121)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-17 15:04:54 +08:00 |
|
QI JUN
|
e821c68611
|
CI: update multi gpu test trigger file list (#6131)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-17 14:48:23 +08:00 |
|