TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-25 21:22:57 +08:00

History

Jinyang Yuan b618e1f55b perf: Eliminate the need for attention DP padding when possible (#3439 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: raccoonliukai <raccoonliu@tencent.com>		2025-05-17 13:30:55 +08:00
..
cmake	fix: support TensorRT 10.11+ in FindTensorRT.cmake (#4353 )	2025-05-16 14:04:56 +08:00
include/tensorrt_llm	refactor: Copy sequence lengths once in decoder setup (#4102 )	2025-05-16 22:03:55 +08:00
kernels	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
micro_benchmarks	feat: support add internal cutlass kernels as subproject (#3658 )	2025-05-06 11:35:07 +08:00
tensorrt_llm	perf: Eliminate the need for attention DP padding when possible (#3439 )	2025-05-17 13:30:55 +08:00
tests	feat: support kv cache reuse for MLA (#3571 )	2025-05-15 15:22:21 +08:00
CMakeLists.txt	fix: better method to help torch find nvtx3 (#4110 )	2025-05-15 16:42:30 +08:00
conandata.yml	infra: add conan (#3744 )	2025-04-30 11:53:14 -07:00
conanfile.py	infra: add conan (#3744 )	2025-04-30 11:53:14 -07:00