Commit Graph

7 Commits

Author SHA1 Message Date
Anthony Chang
4f9fa9f21d
feat: MoE trtllm backend kernel update (#5183)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-16 14:46:13 +08:00
Dom Brown
9c012d5bf8
[TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner (#4872)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-06-09 11:02:48 +01:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins (#4643)
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
Perkz Zheng
4d711be8f4
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
* move cubins to LFS

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add sliding-window-attention generation-phase kernels on Blackwell

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* address comments

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-26 09:06:33 +08:00
Nikita Korobov
e1b42be3d1
fix: TRT-LLM Gen dtype declaration (#4503)
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-05-21 23:56:37 +02:00
Nikita Korobov
fa3879629e
feat: TRT-LLM Gen integration for BMM and MoE refactoring (#4280)
- Adds BatchedGemm cubins and the respective call interface from TensorRT-LLM Generator. 
- Refactors TRT-LLM Gen MoE runner to call to BMM interface
- The accuracy is verified for DeepSeek R1 FP4 

Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-05-16 13:31:53 +02:00
Olya Kozlova
b3e6723dbc
feat: Adding FP8 BMM from Codegen (#3541)
* Adding FP8 BMM from Codegen

Signed-off-by: Olya Kozlova <okozlova@s4124-0110.nvidia.com>

* Fixed licenses

Signed-off-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>

---------

Signed-off-by: Olya Kozlova <okozlova@s4124-0110.nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>
Co-authored-by: Olya Kozlova <okozlova@6u1g-0018.nvidia.com>
Co-authored-by: Olya Kozlova <okozlova@s4124-0062.nvidia.com>
2025-04-16 10:37:15 +02:00