..
assert.cpp
Update TensorRT-LLM ( #1725 )
2024-06-04 20:26:32 +08:00
attentionOp.cpp
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
attentionOp.h
[feat] Enable NVFP4 output for TRTLLM attention kernels ( #4737 )
2025-06-03 10:00:17 +08:00
CMakeLists.txt
Update TensorRT-LLM ( #2792 )
2025-02-18 21:27:39 +08:00
cublasMMWrapper.cpp
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
cublasMMWrapper.h
Update TensorRT-LLM ( #2582 )
2024-12-16 21:50:47 -08:00
cublasVersionCheck.h
Initial commit
2023-09-20 00:29:41 -07:00
cudaBf16Fallbacks.cuh
Update TensorRT-LLM (20240116) ( #891 )
2024-01-16 20:03:11 +08:00
cudaBufferUtils.cuh
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
cudaDriverWrapper.cpp
feat: add CGA reduction fmha kernels on Blackwell. ( #3763 )
2025-04-29 10:43:54 +08:00
cudaDriverWrapper.h
feat: add CGA reduction fmha kernels on Blackwell. ( #3763 )
2025-04-29 10:43:54 +08:00
cudaFp8Utils.cu
Add Llama 4 ( #3302 )
2025-04-09 03:35:21 +08:00
cudaProfilerUtils.cpp
Update TensorRT-LLM ( #1954 )
2024-07-16 15:30:25 +08:00
cudaTypeUtils.cuh
Update TensorRT-LLM ( #2008 )
2024-07-23 23:05:09 +08:00
customAllReduceUtils.h
Update TensorRT-LLM ( #2755 )
2025-02-11 03:01:00 +00:00
envUtils.cpp
feat: cache reuse support (selective cache transfer) in mla cache formatter ( #4749 )
2025-06-04 09:56:31 +08:00
envUtils.h
feat: cache reuse support (selective cache transfer) in mla cache formatter ( #4749 )
2025-06-04 09:56:31 +08:00
jsonSerializeOptional.h
Update TensorRT-LLM ( #2436 )
2024-11-12 15:27:49 +08:00
logger.cpp
chore: improve log-level setting UX ( #4352 )
2025-05-16 09:47:44 +01:00
mathUtils.h
Update TensorRT-LLM ( #2094 )
2024-08-07 16:44:43 +08:00
mcastDevMemUtils.cpp
Adding two-shot allreduce kernel and mnnvl multicasting buffer ( #4216 )
2025-05-22 03:42:36 +08:00
mcastDevMemUtils.h
Adding two-shot allreduce kernel and mnnvl multicasting buffer ( #4216 )
2025-05-22 03:42:36 +08:00
memoryUtils.cu
feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. ( #3438 )
2025-05-02 13:25:30 +08:00
memoryUtils.h
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
nvtxUtils.h
Update TensorRT-LLM ( #2755 )
2025-02-11 03:01:00 +00:00
opUtils.cpp
feat: forward exceptions to Python and catch OOMs ( #4497 )
2025-05-28 11:58:10 +02:00
opUtils.h
feat: forward exceptions to Python and catch OOMs ( #4497 )
2025-05-28 11:58:10 +02:00
quantTypeUtils.cuh
Update TensorRT-LLM ( #2008 )
2024-07-23 23:05:09 +08:00
reduceKernelUtils.cuh
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
safetensors.cpp
Update TensorRT-LLM ( #2792 )
2025-02-18 21:27:39 +08:00
safetensors.h
Update TensorRT-LLM ( #2110 )
2024-08-13 22:34:33 +08:00
stlUtils.h
Update TensorRT-LLM ( #1763 )
2024-06-11 16:59:02 +08:00
stringUtils.cpp
chore: Stabilize ABI boundary for internal kernel library ( #3117 )
2025-04-11 15:07:50 +08:00
timestampUtils.cpp
Update TensorRT-LLM ( #1954 )
2024-07-16 15:30:25 +08:00
timestampUtils.h
Update TensorRT-LLM ( #1954 )
2024-07-16 15:30:25 +08:00
tllmException.cpp
feat: forward exceptions to Python and catch OOMs ( #4497 )
2025-05-28 11:58:10 +02:00
workspace.h
Update TensorRT-LLM ( #2184 )
2024-09-03 12:14:23 +02:00