Anthony Chang
4f9fa9f21d
feat: MoE trtllm backend kernel update ( #5183 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-16 14:46:13 +08:00
Dom Brown
9c012d5bf8
[TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner ( #4872 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-06-09 11:02:48 +01:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins ( #4643 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
Perkz Zheng
4d711be8f4
Feat: add sliding-window-attention generation-phase kernels on Blackwell ( #4564 )
...
* move cubins to LFS
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add sliding-window-attention generation-phase kernels on Blackwell
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* address comments
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-26 09:06:33 +08:00
Nikita Korobov
e1b42be3d1
fix: TRT-LLM Gen dtype declaration ( #4503 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-05-21 23:56:37 +02:00
Nikita Korobov
fa3879629e
feat: TRT-LLM Gen integration for BMM and MoE refactoring ( #4280 )
...
- Adds BatchedGemm cubins and the respective call interface from TensorRT-LLM Generator.
- Refactors TRT-LLM Gen MoE runner to call to BMM interface
- The accuracy is verified for DeepSeek R1 FP4
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-05-16 13:31:53 +02:00
Olya Kozlova
b3e6723dbc
feat: Adding FP8 BMM from Codegen ( #3541 )
...
* Adding FP8 BMM from Codegen
Signed-off-by: Olya Kozlova <okozlova@s4124-0110.nvidia.com>
* Fixed licenses
Signed-off-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>
---------
Signed-off-by: Olya Kozlova <okozlova@s4124-0110.nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>
Co-authored-by: Olya Kozlova <okozlova@6u1g-0018.nvidia.com>
Co-authored-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>
2025-04-16 10:37:15 +02:00