Shi Xiaowei
a0024f4d34
[None][doc] Facilitates the integration of the transfer agent ( #7867 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-21 20:06:24 +08:00
Patrice Castonguay
b7602f7bd4
[ https://nvbugs/5534837 ][fix] Fix KV cache split on long context ( #8247 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Patrice Castonguay
fefa7d8fa3
[None][feat] Support for cancelling requests with disaggregation ( #8114 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-02 11:04:26 -07:00
Iman Tabrizian
33282351a2
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path ( #6348 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-27 19:29:30 -04:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package ( #7766 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
brb-nv
e10a027a03
[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side ( #7624 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-09-20 06:15:26 -07:00
Chuang Zhu
c98b9468af
[None][fix] get Local IP by connect remote ( #7719 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-19 10:01:03 +08:00
Iman Tabrizian
6ce0624208
[TRTLLM-8044][refactor] Rename data -> cache for cacheTransceiver ( #7659 )
2025-09-16 08:43:56 -04:00
Chuang Zhu
f412f5c4b0
[None][fix]UCX zmq ip support ipv6 ( #7530 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-10 10:24:41 +08:00
Tomer Shmilovich
ecc0e687c6
[None][feat] Nixl support for GDS ( #5488 )
...
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
Signed-off-by: Guy Lev <glev@nvidia.com>
Co-authored-by: Guy Lev <glev@nvidia.com>
2025-09-09 13:00:38 +08:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp ( #7117 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception ( #6931 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
brb-nv
43cb50f788
[None][feat] Update TargetInfo to accommodate CP in disagg ( #7224 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-29 15:56:20 -04:00
Chuang Zhu
4d040b50b7
[None][chore] ucx establish connection with zmq ( #6090 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-05 02:50:45 -04:00
Chuang Zhu
ffc0b8f5da
Cache transceiver support VSWA ( #5505 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-05 01:18:42 +09:00
Chuang Zhu
1d2b0d3d80
use file lock to avoid port conflict ( #5123 )
2025-06-16 14:15:37 +08:00
Chuang Zhu
8e9937081d
ucxx only use ucp_feature_tag to aviod some issuse on some platform ( #4994 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-13 19:14:25 +08:00
Chuang Zhu
9a874760c1
Kv cache transfer support duplicate heads ( #4929 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-09 14:11:19 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL ( #4125 )
...
* agentConnection
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
recv
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
agentState
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
NIXL interfaces
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
update cmakelists
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
nixl improve
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
remove cppzmq
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
transferAgent remove register
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
work for cache Test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
reduce sleep time
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
intergarte
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
nixl env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix rebase error
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
cpp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
stash for send metaData
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
loadRemoteMD after fetchRemoteMD
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
workaround for mixed gen and context
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
test_env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
avoid port conflict in test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* format
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use std::string
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* typo
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* fix transferAgentTest
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Shi Xiaowei
df2798e0c3
feat: NIXL interface integration ( #3934 )
...
NIXL interfaces
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-19 18:18:22 +08:00
Chuang Zhu
09a28becae
fix cache buffer ( #3942 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-07 09:49:44 +08:00
Robin Kobus
9f9edd783c
refactor: Introduce MpiTag enumeration and update MPI function signatures ( #3893 )
...
* refactor: Move executor recv functions into classes
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Enhance MPI logging and error handling
- Updated MPI logging to include destination and tag information for better traceability during send and receive operations.
- Added error checking for MPI_Wait and MPI_Cancel calls to ensure proper handling of multi-device requests.
- Improved code structure for clarity and maintainability.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Introduce MpiTag enumeration and update MPI function signatures
- Added a new header file `mpiTags.h` to define an enumeration for MPI tags, improving code readability and maintainability.
- Updated function signatures in `mpiUtils.h` and `mpiUtils.cpp` to use the new `MpiTag` type instead of raw integers for tags.
- Refactored various MPI calls across the codebase to utilize the new `MpiTag` enumeration, enhancing type safety and clarity.
- Removed redundant MPI tag constants from several classes, streamlining the code.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* fixup! refactor: Introduce MpiTag enumeration and update MPI function signatures
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Rename tags for consistency
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-04 13:24:29 +02:00
Chuang Zhu
e2318756ed
cacheTransceiver buffer manager ( #3798 )
...
* cacheTransceiver buffer manager
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* fix args
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* cpp kvCacheManager
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* format
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-27 11:48:15 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache ( #3095 )
...
* Fix hang bug when KV cache is low
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Review comments
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Fix attentiondp typo
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Add CI test for this case
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* fix: Fix the insertion order for responder futures
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* fix: Fix disagg CPP
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Chuang Zhu
6ee021a90d
chore: exchange connection id with tagSend/tagRecv ( #3320 )
...
* exchange connection id with tagSend/tagRecv
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* unwaive
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* tag recv/send
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-14 09:30:34 +08:00
Yuan Tong
a139eae425
chore: Stabilize ABI boundary for internal kernel library ( #3117 )
...
chore: Stabilize ABI boundary for internal kernel library
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-04-11 15:07:50 +08:00
Chuang Zhu
5aeef6d4c7
ucx interface ( #3306 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-07 08:44:34 +08:00
Chuang Zhu
bc5811da65
chore: Ucx ip port remove mpi depend ( #3101 )
...
* initial ucx support
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixes to support dynloading and ucx connection establishment - not stable yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* more connection bringup fixes - faillig on connection vector build
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* executor test pass
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* passed full benchmark
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* changing to TLLM_THROW and removing cout
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* stoping progress thread at ucxComm destructor
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fix copyrights
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* adding ucx flavor to cache transceiver test and insertto the CI pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* allowing sending non ib interfaces IPs
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* setting UCX port reuse for the tests in pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* code review fixes
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep after GID message is sent to avoid UCX Errors
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing more CR issues
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep to not fail is ep_not_connected yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* remove mpi dependency and debug
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* debug to info
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* mpirun n 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* remove mpi comm split when disaggOrchestrator mode
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* waive disagg_mtp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future instead of thread
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future_promise instead of cv wait
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* connectionId type
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* improve test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* imporve test 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* gtest_skip
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-04-02 09:42:29 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00