TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 14:44:24 +08:00

History

2ez4bz 7ebb770dce [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-08-14 02:15:44 -04:00
..
checkpoints	[None][fix] Remove expand configuration from mamba2 mixer (#6521 )	2025-08-05 04:18:25 -04:00
__init__.py	[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105 )	2025-07-17 11:04:17 -07:00
modeling_auto.py	[feat] Implement model-agnostic one-engine eagle3 (#4778 )	2025-06-13 08:11:41 -07:00
modeling_bert.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_clip.py	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 )	2025-07-22 11:06:41 -07:00
modeling_deepseekv3.py	[fix] Fix DeepSeek w4a8 weight loading (#6498 )	2025-08-04 10:12:06 +08:00
modeling_exaone4.py	chore: add EXAONE4 accuracy test (#6397 )	2025-08-04 10:14:16 +08:00
modeling_gemma3.py	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 )	2025-08-05 07:47:41 +00:00
modeling_gemma3vl.py	[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678 )	2025-08-07 13:14:04 -04:00
modeling_hyperclovax.py	fix: support mixture of text & multimodal prompts (#6345 )	2025-07-30 08:52:31 +08:00
modeling_llama_min_latency.py	[Model load] Fix llama min-latency model load (#5883 )	2025-07-15 09:29:19 +08:00
modeling_llama.py	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 )	2025-08-05 07:47:41 +00:00
modeling_llava_next.py	[None][feat] Refactor Llava-Next (#6478 )	2025-08-05 17:53:53 -07:00
modeling_mistral.py	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-08-14 02:15:44 -04:00
modeling_mixtral.py	feat: Remove padding in attention DP. (#6064 )	2025-07-18 23:30:34 +08:00
modeling_mllama.py	feat : support duplicate_kv_weight for qwen3 blockwise scale (#5459 )	2025-06-30 11:49:22 +08:00
modeling_multimodal_encoder.py	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
modeling_multimodal_utils.py	[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444 )	2025-07-21 16:11:58 -07:00
modeling_nemotron_h.py	[None][fix] Remove expand configuration from mamba2 mixer (#6521 )	2025-08-05 04:18:25 -04:00
modeling_nemotron_nas.py	[Perf]: Add residual, norm for nemotron_nas models (#6455 )	2025-07-30 09:10:38 -07:00
modeling_nemotron.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_phi3.py	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 )	2025-07-30 09:20:16 -07:00
modeling_phi4mm.py	[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#6820 )	2025-08-13 21:45:22 -07:00
modeling_pixtral.py	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 )	2025-07-22 11:06:41 -07:00
modeling_qwen2vl.py	[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process (#6004 )	2025-07-31 09:35:24 +09:00
modeling_qwen3_moe.py	Qwen3: Fix eagle hidden states (#6199 )	2025-08-06 17:05:18 -04:00
modeling_qwen3.py	feat(eagle3):support qwen3 dense model (#5879 )	2025-07-19 01:24:32 +08:00
modeling_qwen_moe.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
modeling_qwen.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_siglip.py	feat: Update Gemma3 Vision Encoder (#5973 )	2025-07-14 22:38:10 +08:00
modeling_speculative.py	Mtp optimizations round1 (#5689 )	2025-07-25 13:48:27 -04:00
modeling_utils.py	[fix] Fix DeepSeek w4a8 weight loading (#6498 )	2025-08-04 10:12:06 +08:00
modeling_vila.py	fix: support mixture of text & multimodal prompts (#6345 )	2025-07-30 08:52:31 +08:00