| .. |
|
allReduce
|
refactoring: port customized kernels with public cutlass version (#5027)
|
2025-06-13 16:19:31 +08:00 |
|
cudaCoreGemm
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
fused_gated_gemm
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
routing
|
Refactor the topk parallelization part for the routing kernels (#5567)
|
2025-07-07 15:53:25 +08:00 |
|
sampling
|
test: Test OOB access issue in penaltyKernel for endId=-1 (#4035)
|
2025-05-05 10:24:28 -07:00 |
|
smoothQuant
|
Mxfp8xmxfp4 quant mode(#4978)
|
2025-06-10 22:01:37 +08:00 |
|
weightOnly
|
[feat] Optimizations on weight-only batched gemv kernel (#5420)
|
2025-06-30 10:20:16 +08:00 |
|
banRepeatNGramsKernelsTest.cpp
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
CMakeLists.txt
|
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
|
2025-06-26 12:18:19 +08:00 |
|
decodingKernelTest.cpp
|
refactor: Clean up DecodingInput and DecodingOutput (#5617)
|
2025-07-01 14:31:42 +02:00 |
|
logitsBitmaskTest.cpp
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
mixtureOfExpertsTest.cu
|
feat: Add support for MXFP8xMXFP4 in pytorch (#5535)
|
2025-07-06 15:32:06 -07:00 |
|
mlaChunkedPrefillTest.cu
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
|
2025-06-26 22:18:08 +08:00 |
|
mlaPreprocessTest.cu
|
[feat] Optimize KV Cache Reuse for MLA (#4869)
|
2025-06-13 11:03:05 +08:00 |
|
ropeTest.cu
|
feat: Add FP8 support for SM 120 (#3248)
|
2025-04-14 16:05:41 -07:00 |
|
shiftKCacheKernelTest.cu
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
stopCriteriaKernelsTest.cpp
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |