Xiwen Yu
2c3f4cbeee
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Xiwen Yu
345c2bceaa
update trtllm-gen sm100f cubins of gemm kernels
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 14:25:12 +08:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins ( #4643 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
CarstyYou
ef280e687e
[feat] support fp8 blockscale gemm on sm89 ( #4481 )
...
* [feat] integrate ada blockwise gemm
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [fix] align scale M
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [feat] swizzle mma output
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [test] add ut for sm89
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [delete] remove useless comments
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] codestyle
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [fix] fix review comments
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] fix license
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] fix license
Signed-off-by: CarstyYou <xiy@nvidia.com>
---------
Signed-off-by: CarstyYou <xiy@nvidia.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-05-23 10:39:10 +08:00
Gabriel Wu
05b50b297f
[feat] open source fp8_blockscale_gemm ( #3071 )
...
Signed-off-by: Zihua Wu <zihuaw@nvidia.com>
2025-04-02 12:12:52 +08:00
Chang Liu
1d3a5d38af
fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions ( #3144 )
...
* Update fp8 sf layout for blackwell and enable fp8 gemm e2e
* Add test case when m needs to be padded
* Better comment
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Add TODO for fp8 quant kernel
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Enable DCO check
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Fix lint
---------
Signed-off-by: Chang Liu <liuc@nvidia.com>
2025-04-01 13:08:29 -07:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00