TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 04:03:22 +08:00

History

HuiGao-NV 97674c3114 [TRTLLM-8690][feat] add more tensors to share buffers (#8691 ) Signed-off-by: Hui Gao <huig@nvidia.com>		2025-11-03 21:08:01 -08:00
..
fla
fused_moe	[TRTLLM-8690][feat] add more tensors to share buffers (#8691 )	2025-11-03 21:08:01 -08:00
mamba
__init__.py
attention.py	[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104 )	2025-11-04 09:06:58 +08:00
decoder_layer.py
embedding.py
gated_mlp.py
layer_norm.py	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 )	2025-10-24 13:40:41 -04:00
linear.py	[https://nvbugs/5599086 ][fix] Fix FP8 Linear module for spark (#8707 )	2025-10-29 13:58:19 -07:00
logits_processor.py
mlp.py
multi_stream_utils.py
qk_norm_attention.py	[None][fix] Avoid unnecessary concat in attn_output_gate case. (#8094 )	2025-10-13 12:59:40 -07:00
rms_norm.py
rotary_embedding.py
swiglu.py
triton_linear.py