Commit Graph

6 Commits

Author SHA1 Message Date
Gabriel Wu
05b50b297f
[feat] open source fp8_blockscale_gemm (#3071)
Signed-off-by: Zihua Wu <zihuaw@nvidia.com>
2025-04-02 12:12:52 +08:00
Chang Liu
1d3a5d38af
fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions (#3144)
* Update fp8 sf layout for blackwell and enable fp8 gemm e2e

* Add test case when m needs to be padded

* Better comment

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Add TODO for fp8 quant kernel

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Enable DCO check

Signed-off-by: Chang Liu <liuc@nvidia.com>

* Fix lint

---------

Signed-off-by: Chang Liu <liuc@nvidia.com>
2025-04-01 13:08:29 -07:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00