Commit Graph

12 Commits

Author SHA1 Message Date
Xiwen Yu
2c3f4cbeee Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Xiwen Yu
345c2bceaa update trtllm-gen sm100f cubins of gemm kernels
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 14:25:12 +08:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins (#4643)
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
CarstyYou
ef280e687e
[feat] support fp8 blockscale gemm on sm89 (#4481)
* [feat] integrate ada blockwise gemm

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [fix] align scale M

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [feat] swizzle mma output

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [test] add ut for sm89

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [delete] remove useless comments

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] codestyle

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [fix] fix review comments

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] fix license

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] fix license

Signed-off-by: CarstyYou <xiy@nvidia.com>

---------

Signed-off-by: CarstyYou <xiy@nvidia.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-05-23 10:39:10 +08:00
Gabriel Wu
05b50b297f
[feat] open source fp8_blockscale_gemm (#3071)
Signed-off-by: Zihua Wu <zihuaw@nvidia.com>
2025-04-02 12:12:52 +08:00
Chang Liu
1d3a5d38af
fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions (#3144)
* Update fp8 sf layout for blackwell and enable fp8 gemm e2e

* Add test case when m needs to be padded

* Better comment

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Add TODO for fp8 quant kernel

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Enable DCO check

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Fix lint

---------

Signed-off-by: Chang Liu <liuc@nvidia.com>
2025-04-01 13:08:29 -07:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00