Yechan Kim
60073a7ad9
[None][feat] Support SharedTensor on MultimodalParams ( #6254 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-10 17:48:24 -07:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs ( #6592 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
shaharmor98
14b36e07d7
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache ( #6574 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 16:27:51 -04:00
Ziyi Xiong
de472828b9
[TRTLLM-6637][feat] Resolve KV cache divergence issue ( #6628 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-09 23:15:04 +08:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation ( #5785 )
...
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Mike Iovine
90145cf557
[None][feat] Optimize CUDA graph memory usage for spec decode cases ( #6718 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-08 13:56:53 -04:00
Stefan Niebler
b8f036f264
[TRTLLM-6650][fix] Enhance CUDA graph + Beam search to correctly handle padding ( #6665 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-08-08 14:00:33 +02:00
Liao Lanyu
32ad7f3c12
[None][fix] Remove lock related typo in py_executor ( #6653 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-08 17:48:57 +08:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving ( #6704 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
zhanghaotong
1cf669496a
[None][fix] Fix unnecessary GPU synchronization in torch sampler caused by incorrect tensor reference ( #6626 )
...
Signed-off-by: 皓聪 <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: 皓聪 <zhanghaotong.zht@alibaba-inc.com>
2025-08-07 23:44:47 -04:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Iman Tabrizian
82276167e6
[None][feat] Add NCCL Symmetric Integration for All Reduce ( #4500 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-07 17:28:14 -07:00
Yuan Tong
db8dc97b7b
[None][fix] Migrate to new cuda binding package name ( #6700 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-07 16:29:55 -04:00
Mike Iovine
e968f98b43
[None][feat] Clean up ngram auto mode, add max_concurrency to configs ( #6676 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-07 12:51:47 -04:00
Emma Qiao
3c44b44e45
[None][infra] Fix guardwords ( #6711 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-07 21:06:47 +08:00
pcastonguay
453a06e6ab
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events ( #6563 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-07 14:17:07 +02:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) ( #6300 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
amitz-nv
85af62184b
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter ( #6510 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-07 09:05:36 +03:00
Netanel Haber
83ee91e17b
[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait ( #6646 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-08-06 14:18:09 +08:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
ixlmar
1ebceb790d
[TRTLLM-5508][feat] check input tokens + improve error handling ( #5170 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-08-05 18:27:43 +01:00
liji-nv
dcbfa7e509
[ https://nvbugs/5252313 ][fix] Fix torch compile + MTP ( #6554 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system ( #6464 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
danielafrimi
ed801ff74b
[None][fix] Remove expand configuration from mamba2 mixer ( #6521 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-05 04:18:25 -04:00
Olya Kozlova
13cc1c4878
[TRTLLM-5271][feat] best_of/n for pytorch workflow ( #5997 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-08-04 14:08:06 +02:00
Chuang Zhu
542f552d0b
use cudaSetDevice to create context ,fix nvbug 5394497 ( #6403 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-03 13:32:55 -04:00
Shunkangz
67a3fd858b
[None][feat] Add support of scheduling attention dp request ( #6246 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-01 20:38:01 -04:00
liji-nv
1daa8c3232
[ https://nvbugs/5340941 ][ https://nvbugs/5375785 ] - fix: Wrap attentio… ( #6355 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
Robin Kobus
d3c14682f0
refactor: Remove unused buffers and bindings from sampler ( #6484 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-01 00:43:03 -04:00
Jaedeok Kim
fbee279909
fix: remove duplicate layer multiplication in KV cache size calculation ( #6481 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2025-07-31 22:34:34 -04:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell ( #6486 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Yukun He
00059de380
chore: Improve the AutoTuner log information. ( #6368 )
...
* Change the fallback alert from DEBUG to WARNING level and only do it once.
* Add debug information for profiling cache right after the warmup phase.
* Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level.
* Other trivial refinements and cleanups.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-01 09:19:52 +08:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically ( #6363 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
shaharmor98
0c42f54a39
Bugfix/fix nemotron nas lora support ( #6380 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-31 13:39:35 -04:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control ( #6220 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
Enwei Zhu
4b299cb77e
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 ( #6408 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-31 09:53:52 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm ( #6353 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend ( #6369 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
QI JUN
2fe9cc0889
chore: remove draft_model_engine from init parameter list of PyExecutor ( #6325 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-30 03:31:49 -04:00
QI JUN
1f39a11af0
chore: clean code of PyExecutor ( #6445 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-30 02:11:43 -04:00
QI JUN
13e24ab1cb
chore: remove unused code in PyExecutor ( #6351 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-29 16:24:26 +08:00
QI JUN
4efc6496b7
chore: add _prepare_and_schedule_batch function in PyExecutor ( #6365 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-28 05:50:27 -04:00
Ziyi Xiong
d853811190
[ https://nvbugs/5402719 ][fix]: Add cuda graph dummy requests to the spec_resource_manager ( #6258 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-26 20:32:39 -04:00
Mike Iovine
0f2f11f90b
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model ( #6104 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-24 21:50:11 -04:00
Stefan Niebler
0df758ec9f
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration ( #6217 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-24 18:04:41 +02:00
Lizhi Zhou
a63a1ac7f9
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs ( #6085 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-07-24 16:21:01 +08:00
QI JUN
428e34080f
chore: remove unused variables in pyexecutor ( #6280 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-24 13:16:15 +08:00
Stefan Niebler
2486eb778e
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler ( #6223 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-23 12:30:50 +02:00