mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-01 08:41:13 +08:00
* add docstring to summarize current rope support Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor: replace call_method, adjust inserting order of cos_sin_cache calculation node Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * add unit test for triton rope and ds rope Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * update rope matcher to match DS RoPE, add custom op for reference, add unit test case Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * cache cos[pos_idx].unsqueeze and sin[pos_idxs].unsqueeze Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor doc update Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * separate pattern matching and optimization for explicit and complex rope + minor updates Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * clean rope impl in repo Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * replace fused_flattened_mla_with_cache's rope impl with torch_apply_rope_with_qk_interleaving, update unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * separate layout infer and transpose to a new transformation Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * update rope_with_explicit_freqs and rope_with_input_interleaved to expose unsqueeze_dim and support match_rope_layout, add unit tests Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * solve merge conflict in transform.py, need to fix optimize_rope with cuda graph capture Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor clean up after rebase Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> * fix pre-commit Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * support map to bnsd layout and infer unsqueeze_dim from op Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix cos/sin not the same across prompts in the same batch issue when mapping to flashinfer op Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix for unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix custom op input/output node ordering issue for DeepSeek V3 rope Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * clean code Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * move flattening of cos_sin_cache to the graph, update flashinfer op docstring and test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * debug transform unit test failure Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| compile | ||
| custom_ops | ||
| distributed | ||
| models | ||
| shim | ||
| transformations | ||
| utils | ||
| __init__.py | ||