TensorRT-LLMs/cpp/include/tensorrt_llm/deep_gemm
Jinyang Yuan 5339d367ce
[perf] Reduce the workspace size of FP4 activation scales for MoE (#4303)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-30 09:03:52 +08:00
..
compiler.cuh Feat: add deep_gemm swapab Kernel (#4430) 2025-05-21 10:48:43 +08:00
fp8_gemm_impl.cuh Feat: add deep_gemm swapab Kernel (#4430) 2025-05-21 10:48:43 +08:00
fp8_gemm.cuh [perf] Reduce the workspace size of FP4 activation scales for MoE (#4303) 2025-05-30 09:03:52 +08:00
jit_utils.cuh Feat: add deep_gemm swapab Kernel (#4430) 2025-05-21 10:48:43 +08:00
mma_utils.cuh Feat: add deep_gemm swapab Kernel (#4430) 2025-05-21 10:48:43 +08:00
nvrtc_cutlass.cuh feat: use NVRTC for DeepGEMM JIT compilation (#3239) 2025-04-07 20:29:23 +08:00
nvrtc_std.cuh feat: use NVRTC for DeepGEMM JIT compilation (#3239) 2025-04-07 20:29:23 +08:00
runtime.cuh feat: use NVRTC for DeepGEMM JIT compilation (#3239) 2025-04-07 20:29:23 +08:00
scheduler.cuh [perf] Reduce the workspace size of FP4 activation scales for MoE (#4303) 2025-05-30 09:03:52 +08:00
tma_utils.cuh Feat: add deep_gemm swapab Kernel (#4430) 2025-05-21 10:48:43 +08:00
utils.cuh feat: use NVRTC for DeepGEMM JIT compilation (#3239) 2025-04-07 20:29:23 +08:00