mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* add test to map flashinfer rope op with triton custom rope ops and pytorch rope in fused_mha Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * add rope matcher and unit tests Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * capture cos and sin from graph Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * revert fuse_mha op change Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor update to address comment and remove redundant unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * move view and transpose into graph nodes and update unit test to test custom op directly Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * move view into custom op, update bfs with bound, update custom op return type to be half precision Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * custom op update to support 3D input Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * handle bnsd and bsnd format, update tests, handle 3D cos/sin input to the custom op Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * add llama4 rope test, update custom op with is_neox flag Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * add llama4 style rope to matcher and update unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * separate into two transformations Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix when num_head != num_kv_head; add support for cached position_ids and cos_sin_cache in graph; update unit tests Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor update, cache locally and propagate meta info of qk nodes Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor: fix cos_sin_cache not float Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor: move cache into matcher Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| attention_backend | ||
| auto_deploy | ||
| compilation | ||
| custom_ops | ||
| models | ||
| modules | ||
| peft | ||
| pyexecutor | ||
| speculative | ||
| __init__.py | ||
| autotuner.py | ||
| distributed.py | ||
| llm.py | ||
| metadata.py | ||
| model_config.py | ||
| pipeline_interface.py | ||
| utils.py | ||