Xiwen Yu
|
2c3f4cbeee
|
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-09-05 15:53:43 +08:00 |
|
sychen52
|
98a1bffb7c
|
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
|
2025-09-04 09:03:38 -07:00 |
|
Xiwen Yu
|
345c2bceaa
|
update trtllm-gen sm100f cubins of gemm kernels
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-08-06 14:25:12 +08:00 |
|
Nikita Korobov
|
8043d7a03c
|
feat: update DeepSeek FP8 TRT-LLM Gen cubins (#4643)
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
|
2025-06-03 14:07:54 -07:00 |
|
chenfeiz0326
|
7f5716ef83
|
Cherry-pick trtllm-gen from feat/llama4 to main (#4086)
* feat: TRT-LLM Gen FP8 MoE Llama4
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* feat: TRT-LLM Gen llama4 MoE Top1 routing
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
* feat: add per tensor FP8 TRT-LLM Gen GEMMs
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add guard for routingIndicesClusterKernel
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
---------
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Nikita Korobov <nkorobov@nvidia.com>
Co-authored-by: Jiqun Tu <jtu@nvidia.com>
|
2025-05-08 14:13:01 -07:00 |
|