TensorRT-LLMs/tensorrt_llm/_torch/custom_ops
Mike Iovine 9c0de251db
[feat] Integrate Hopper chunked attention kernels (#4330)
* Integrate chunked attention kernels

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

* Fix cache key

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

* Fix lint

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

---------

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-22 17:10:57 -04:00
..
__init__.py feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438) 2025-05-02 13:25:30 +08:00
cpp_custom_ops.py feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384) 2025-05-20 17:53:48 +08:00
flashinfer_custom_ops.py feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438) 2025-05-02 13:25:30 +08:00
torch_custom_ops.py [feat] Integrate Hopper chunked attention kernels (#4330) 2025-05-22 17:10:57 -04:00
userbuffers_custom_ops.py feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00