TensorRT-LLMs/cpp/tensorrt_llm/executor/cache_transmission
Iman Tabrizian af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095)
* Fix hang bug when KV cache is low

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Review comments

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Fix attentiondp typo

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add CI test for this case

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* fix: Fix the insertion order for responder futures

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* fix: Fix disagg CPP

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
..
mpi_utils chore: Ucx ip port remove mpi depend (#3101) 2025-04-02 09:42:29 +08:00
ucx_utils chore: exchange connection id with tagSend/tagRecv (#3320) 2025-04-14 09:30:34 +08:00
cacheConcatenate.cu bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095) 2025-04-21 15:16:55 +08:00
cacheConcatenate.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00