TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 02:31:33 +08:00

History

Jinyang Yuan 0a0f93d4a8 [None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>		2025-10-27 10:18:19 +08:00
..
batch_manager	[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952 )	2025-10-24 08:58:16 -04:00
common	chore: remove usernames from comments (#3291 )	2025-04-05 13:44:28 +08:00
executor	[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952 )	2025-10-24 08:58:16 -04:00
kernels	[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501 )	2025-10-27 10:18:19 +08:00
layers	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 )	2025-08-24 20:53:17 +02:00
multi_gpu	[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952 )	2025-10-24 08:58:16 -04:00
runtime	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 )	2025-08-24 20:53:17 +02:00
thop	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
utils	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
CMakeLists.txt	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 )	2025-08-24 20:53:17 +02:00