Jhao-Ting Chen
|
220dc01372
|
[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-09-23 14:56:17 -07:00 |
|
Yao Yao
|
942e080415
|
[fix] Fix missing fields in xqa kernel cache key (#6282)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
|
2025-08-01 10:41:26 +08:00 |
|
Jhao-Ting Chen
|
54f68287fc
|
fix precompiled multi_query_token kernel not having is_fp8_out hash key (#6279)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-25 20:45:53 -04:00 |
|
Jhao-Ting Chen
|
e4c777df7d
|
Add is_fp8_output key to XQA kernel cubin hashing (solves Eagle3-one-engine Hopper fp8 bug) (#5813)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-09 09:26:27 +08:00 |
|
Yao Yao
|
3545d59635
|
Support speculative decoding with Hopper XQA (#3269)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
|
2025-04-07 17:14:34 +08:00 |
|
石晓伟
|
2a115dae84
|
Update TensorRT-LLM (#1793)
Co-authored-by: DreamGenX <x@dreamgen.com>
Co-authored-by: Ace-RR <78812427+Ace-RR@users.noreply.github.com>
Co-authored-by: bprus <39293131+bprus@users.noreply.github.com>
Co-authored-by: janpetrov <janpetrov@icloud.com>
|
2024-06-18 18:18:23 +08:00 |
|