Matthias Jouanneaux
|
1be7faef37
|
[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
|
2025-09-19 20:55:32 +08:00 |
|
jmydurant
|
7deefb3d2b
|
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-09-15 21:43:49 +08:00 |
|
Enwei Zhu
|
5ff3a65b23
|
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-03 15:16:11 -07:00 |
|
zhhuang-nv
|
7e135d2ea7
|
[None][feat] Use Separate QKV Input Layout for Context MLA (#6538)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
|
2025-08-19 22:04:48 +08:00 |
|
Liao Lanyu
|
f7c13a4aa7
|
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attentionOp (#6745)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2025-08-12 16:45:16 +08:00 |
|