Patrice Castonguay
8a751a0e56
[None][chore] Remove is_disaggregated param in executor request queue ( #9049 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-12 13:37:15 -05:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner ( #7572 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
mpikulski
20fd305bb6
[None][fix] type annotation ( #9071 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 07:20:20 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling ( #9012 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
Guoming Zhang
b894dc2d70
[None][fix] Display the GPU memory information in GiB unit. ( #9070 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-11 06:24:59 -08:00
mpikulski
979b3ae9ce
[TRTLLM-7723][feat] sampling using FlashInfer.sampling ( #8581 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 03:21:19 -08:00
Yuxian Qiu
7aeac97e4e
[ https://nvbugs/5622938 ][fix] Use async send_requests_to_next_pp. ( #9041 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-11 14:19:44 +08:00
mpikulski
edc91ba819
[None][fix] Improve type annotations on ResourceManager.get_resource_manager ( #9013 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-10 15:06:16 +01:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout ( #8892 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Stefan Niebler
326a201473
[ https://nvbugs/5508536 ][fix] Take Over ( #8627 ): Reintroduce: Move stop_criteria to sample_async ( #7041 ) ( #8794 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Cao Dong
b53961e972
[None][feat] Return logprobs incrementally in torch backend ( #8785 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-07 10:23:39 +08:00
jthomson04
fcae852cef
[None][fix] Fix KV cache clearing with KV Connector API ( #8750 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-06 14:28:27 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration ( #8302 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Cao Dong
dddfcdd3bf
[None][fix] Fix bug of undefined py_topk_logprobs_vals ( #8789 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-04 19:32:59 +08:00
Patrice Castonguay
65c138108e
[ https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg ( #8372 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yan Chunwei
ed297d7c2e
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api ( #8415 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-03 17:59:49 -08:00
Yechan Kim
f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest ( #8453 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs ( #8600 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
Yuxian Qiu
025d2926df
[ https://nvbugs/5599515 ][fix] Fix PP bubbles. ( #8687 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Iman Tabrizian
ae6875fe10
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager ( #8699 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-10-29 08:04:26 -07:00
Leslie Fang
451959c60d
[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend ( #8717 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-29 20:37:14 +08:00
Fanrong Li
a21697ead9
[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench ( #8729 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-10-29 05:17:16 -07:00
kris1025
e2c5a38879
[ https://nvbugs/5534574 ][fix] disable spec decoding forever once the request spec decoding is disabled ( #8446 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-29 19:28:43 +08:00
Yechan Kim
cf8a1d2ef9
[ https://nvbugs/5596377 ][fix] Fix mm dummy calculation ( #8498 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-29 09:45:21 +09:00
Mike Iovine
00161b315f
[ https://nvbugs/5549111 ][fix] Fix 2-model overlap scheduler accuracy on very long prompts ( #8076 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Michael Iovine <miovine@nvidia.com>
2025-10-28 14:55:34 -07:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests ( #8628 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
QI JUN
4fd58137a1
[TRTLLM-8933][chore] remove unused update_executor_config function ( #8678 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-27 10:00:47 -04:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker ( #8246 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Aurelien Chartier
cdf0403c64
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest ( #8634 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-24 06:44:34 -07:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA ( #7952 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly ( #8561 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
QI JUN
cc81028547
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig ( #8558 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-23 10:32:09 -04:00
sunnyqgg
ea3e0eea51
[TRTLLM-7954][feat] Target model KV cache rellocation ( #8421 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-23 09:36:50 +08:00
Leslie Fang
e5865de518
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args ( #8493 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 20:03:18 -04:00
Patrice Castonguay
879039f6d5
[ https://nvbugs/5429636 ][feat] Kv transfer timeout ( #8459 )
...
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
Leslie Fang
50d4e5bc06
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor ( #8451 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 08:33:48 +08:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token ( #8502 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements ( #8398 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend ( #7926 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
jthomson04
852316886e
[None][fix] Fix KV event consumption ( #6346 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API ( #8412 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
Kyle McGill
136e0e6882
[None][feat] Enable CUDA graph support for KvConnectorWorker API ( #8275 )
...
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
2025-10-17 18:09:03 -04:00
John Calderon
46ee7acb33
[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling ( #7539 )
...
Signed-off-by: John Calderon <johncalesp@gmail.com>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Signed-off-by: john calderon <jcalderon@nvidia.com>
Signed-off-by: John Calderon <jcalderon@nvidia>
2025-10-16 17:49:22 +02:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server ( #7637 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window ( #8320 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
HuiGao-NV
e265eb5fe9
[None][feat] reuse cudagraph memory pool in normal forward flow ( #8095 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-16 07:08:44 +08:00
QI JUN
65ec01b257
[TRTLLM-8532][chore] clean warmup method of ModelEngine ( #8264 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 08:40:58 -07:00