| .. |
|
allReduce
|
refactoring: port customized kernels with public cutlass version (#5027)
|
2025-06-13 16:19:31 +08:00 |
|
cudaCoreGemm
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
fused_gated_gemm
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
routing
|
Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) (#5519)
|
2025-06-27 20:17:40 +08:00 |
|
sampling
|
test: Test OOB access issue in penaltyKernel for endId=-1 (#4035)
|
2025-05-05 10:24:28 -07:00 |
|
smoothQuant
|
Mxfp8xmxfp4 quant mode(#4978)
|
2025-06-10 22:01:37 +08:00 |
|
weightOnly
|
chore: Mass integration of release/0.20. (#4871)
|
2025-06-04 14:12:27 +08:00 |
|
banRepeatNGramsKernelsTest.cpp
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
CMakeLists.txt
|
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
|
2025-06-26 12:18:19 +08:00 |
|
decodingKernelTest.cpp
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
logitsBitmaskTest.cpp
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
mixtureOfExpertsTest.cu
|
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
|
2025-06-26 12:18:19 +08:00 |
|
mlaChunkedPrefillTest.cu
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
|
2025-06-26 22:18:08 +08:00 |
|
mlaPreprocessTest.cu
|
[feat] Optimize KV Cache Reuse for MLA (#4869)
|
2025-06-13 11:03:05 +08:00 |
|
ropeTest.cu
|
feat: Add FP8 support for SM 120 (#3248)
|
2025-04-14 16:05:41 -07:00 |
|
shiftKCacheKernelTest.cu
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
stopCriteriaKernelsTest.cpp
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |