TensorRT-LLMs/tensorrt_llm/_torch/attention_backend
Zeyu WANG 2681b26e48
[TRTLLM-2795] feat: Add yarn support for other models in trt-flow (#3840)
Add yarn support for general models(e.g. llama, qwen) other than deepseek in trt-flow.

Signed-off-by: Zeyu Wang <zeyuw@nvidia.com>
2025-05-15 11:03:57 +08:00
..
__init__.py feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438) 2025-05-02 13:25:30 +08:00
flashinfer.py [feat] Enable chunked context for flashinfer (#4132) 2025-05-15 10:59:38 +08:00
interface.py [TRTLLM-2795] feat: Add yarn support for other models in trt-flow (#3840) 2025-05-15 11:03:57 +08:00
star_flashinfer.py Remove dummy forward path (#3669) 2025-04-18 16:17:50 +08:00
trtllm.py feat: Support Gemma3-1b-it in Pytorch workflow (#3999) 2025-05-14 14:02:44 +08:00
utils.py feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438) 2025-05-02 13:25:30 +08:00
vanilla.py Remove dummy forward path (#3669) 2025-04-18 16:17:50 +08:00