TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-04 10:11:47 +08:00

History

Zheng Duan c9ed1ab436 [TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>		2025-07-30 10:39:40 +08:00
..
batch_manager	[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135 )	2025-07-30 10:39:40 +08:00
common	[https://nvbugs/5387771 ] fix deadlocks due to insufficient numSemaphores (#6262 )	2025-07-23 11:20:55 +08:00
cutlass_extensions/include/cutlass_extensions
deep_ep	DeepEP LL dispatch FP4 (#6296 )	2025-07-28 11:25:42 +08:00
executor	[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974 )	2025-07-25 18:10:40 -04:00
executor_worker
kernels	[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205 )	2025-07-28 09:36:26 +08:00
layers
nanobind	fix: integration tests with nanobind (#6326 )	2025-07-25 09:23:20 +08:00
plugins
pybind	[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974 )	2025-07-25 18:10:40 -04:00
runtime
testing
thop	[https://nvbugs/5340941 ] - fix: Correct custom ops used by Qwen3 Moe … (#6285 )	2025-07-25 14:49:45 +08:00
CMakeLists.txt