Leslie Fang
|
e5865de518
|
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 20:03:18 -04:00 |
|
Leslie Fang
|
50d4e5bc06
|
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 08:33:48 +08:00 |
|
Kyle McGill
|
136e0e6882
|
[None][feat] Enable CUDA graph support for KvConnectorWorker API (#8275)
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
|
2025-10-17 18:09:03 -04:00 |
|
Leslie Fang
|
8d1b068b1a
|
[TRTLLM-8477][chore] Replace KvCacheConfigCpp with KvCacheConfig inside PyExecutor (#8259)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-13 14:55:36 +08:00 |
|
Jonas Yang CN
|
88ea2c4ee9
|
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-04 08:12:24 +08:00 |
|
brb-nv
|
bd3d0ad233
|
[TRTLLM-7733][feat] Executor changes to support helix parallelism (#7972)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-10-01 22:13:03 -04:00 |
|
Yibin Li
|
d7581bb551
|
[TRTLLM-8031][feat] Add chunked return_generation_logits logic (#7831)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-10-01 12:47:07 -04:00 |
|
QI JUN
|
1529a6f22d
|
[None][chore] extract weights loading related logic to model loader (#7579)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-25 10:19:22 -07:00 |
|
Liao Lanyu
|
18095a7cb8
|
[https://nvbugs/5503440][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2025-09-19 18:13:33 +08:00 |
|
Mike Iovine
|
b3c57a7042
|
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-09-01 14:37:44 -04:00 |
|
Shunkangz
|
ff4047414b
|
[None][opt] Balance the request based on number of tokens in AttentionDP (#7183)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-08-27 11:16:12 +08:00 |
|
qixiang-99
|
b165f8bc97
|
fix/improve kvcache allocation in PyTorch runtime (#5933)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-08-26 12:40:22 +08:00 |
|
QI JUN
|
bea5e07fb7
|
[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-25 20:52:05 +08:00 |
|
Robin Kobus
|
b95cab2a7c
|
[None][ci] move unittests to sub-directories (#6635)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-08-20 05:42:22 -04:00 |
|