TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 08:41:13 +08:00

History

Fridah-nv bce281d592 feat: [AutoDeploy] update rope matcher with minor variants (Deepseek) (#3638 ) * add docstring to summarize current rope support Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor: replace call_method, adjust inserting order of cos_sin_cache calculation node Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * add unit test for triton rope and ds rope Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * update rope matcher to match DS RoPE, add custom op for reference, add unit test case Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * cache cos[pos_idx].unsqueeze and sin[pos_idxs].unsqueeze Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor doc update Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * separate pattern matching and optimization for explicit and complex rope + minor updates Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * clean rope impl in repo Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * replace fused_flattened_mla_with_cache's rope impl with torch_apply_rope_with_qk_interleaving, update unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * separate layout infer and transpose to a new transformation Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * update rope_with_explicit_freqs and rope_with_input_interleaved to expose unsqueeze_dim and support match_rope_layout, add unit tests Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * solve merge conflict in transform.py, need to fix optimize_rope with cuda graph capture Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor clean up after rebase Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> * fix pre-commit Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * support map to bnsd layout and infer unsqueeze_dim from op Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix cos/sin not the same across prompts in the same batch issue when mapping to flashinfer op Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix for unit test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * fix custom op input/output node ordering issue for DeepSeek V3 rope Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * clean code Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * move flattening of cos_sin_cache to the graph, update flashinfer op docstring and test Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * debug transform unit test failure Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>		2025-05-16 09:55:32 -04:00
..
compile	[AutoDeploy]feat: Add an AutoDeploy compile backend that only calls torch.compile (#4240 )	2025-05-16 08:38:15 +08:00
custom_ops	feat: [AutoDeploy] update rope matcher with minor variants (Deepseek) (#3638 )	2025-05-16 09:55:32 -04:00
distributed	[AutoDeploy] Make all ranks agree on kv-cache size (#4007 )	2025-05-02 04:07:28 +08:00
models	feat:[AutoDeploy] Update MoE pattern matcher to drop expert selection logic (#3283 )	2025-05-15 13:53:09 +08:00
shim	[AutoDeploy] fix: disable overlap scheduler until supported (#4365 )	2025-05-15 16:19:30 -07:00
transformations	feat: [AutoDeploy] update rope matcher with minor variants (Deepseek) (#3638 )	2025-05-16 09:55:32 -04:00
utils	feat: [AutoDeploy] update rope matcher with minor variants (Deepseek) (#3638 )	2025-05-16 09:55:32 -04:00
__init__.py	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00