TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

2ez4bz cf0c47ca2d [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>		2025-09-01 11:02:31 +08:00
..
checkpoints	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 )	2025-08-25 23:56:21 -04:00
__init__.py	[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521 )	2025-08-15 06:56:44 +08:00
modeling_auto.py	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 )	2025-08-19 21:42:50 -07:00
modeling_bert.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_clip.py	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 )	2025-07-22 11:06:41 -07:00
modeling_deepseekv3.py	[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction (#7369 )	2025-08-31 21:20:00 -04:00
modeling_exaone4.py	chore: add EXAONE4 accuracy test (#6397 )	2025-08-04 10:14:16 +08:00
modeling_gemma3.py	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 )	2025-08-05 07:47:41 +00:00
modeling_gemma3vl.py	[None][chore] Mass integration of release/1.0 (#6864 )	2025-08-22 09:25:15 +08:00
modeling_gpt_oss.py	[None][fix] Fix perfect router. (#6797 )	2025-08-14 20:09:08 -07:00
modeling_hunyuan_moe.py	[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521 )	2025-08-15 06:56:44 +08:00
modeling_hyperclovax.py	[None][fix] Refactoring input prep to allow out-of-tree models (#6497 )	2025-08-12 20:29:10 -04:00
modeling_llama_min_latency.py	[Model load] Fix llama min-latency model load (#5883 )	2025-07-15 09:29:19 +08:00
modeling_llama.py	[None][feat] Refactor llama4 for multimodal encoder IFB (#6844 )	2025-08-28 13:22:19 -07:00
modeling_llava_next.py	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 )	2025-08-19 21:42:50 -07:00
modeling_mistral.py	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-09-01 11:02:31 +08:00
modeling_mixtral.py	[None][perf] Improve the performance of online EPLB on Hopper by better overlapping (#6624 )	2025-08-12 09:25:13 +08:00
modeling_mllama.py	feat : support duplicate_kv_weight for qwen3 blockwise scale (#5459 )	2025-06-30 11:49:22 +08:00
modeling_multimodal_encoder.py	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
modeling_multimodal_utils.py	[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444 )	2025-07-21 16:11:58 -07:00
modeling_nemotron_h.py	[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334 )	2025-08-22 12:15:20 -04:00
modeling_nemotron_nas.py	[Perf]: Add residual, norm for nemotron_nas models (#6455 )	2025-07-30 09:10:38 -07:00
modeling_nemotron.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_phi3.py	[None][fix] Accommodate Phi3/4 to work with ModelOpt's FP8 ckpts in Torch (#6761 )	2025-08-19 09:22:46 -07:00
modeling_phi4mm.py	[TRTLLM-6825][fix] Update lora for phi4-mm (#6817 )	2025-08-21 22:00:04 -04:00
modeling_pixtral.py	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 )	2025-07-22 11:06:41 -07:00
modeling_qwen2vl.py	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 )	2025-08-19 21:42:50 -07:00
modeling_qwen3_moe.py	[None][perf] Improve the performance of online EPLB on Hopper by better overlapping (#6624 )	2025-08-12 09:25:13 +08:00
modeling_qwen3.py	[None][feat] Support Yarn on Qwen3 (#6785 )	2025-08-17 07:21:29 +08:00
modeling_qwen_moe.py	[None][perf] Improve the performance of online EPLB on Hopper by better overlapping (#6624 )	2025-08-12 09:25:13 +08:00
modeling_qwen.py	feat: Remove not used padding_idx in models (#5385 )	2025-06-25 17:19:59 +08:00
modeling_siglip.py	feat: Update Gemma3 Vision Encoder (#5973 )	2025-07-14 22:38:10 +08:00
modeling_speculative.py	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 )	2025-08-26 18:31:33 -04:00
modeling_utils.py	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 )	2025-08-19 21:42:50 -07:00
modeling_vila.py	[None][fix] Refactoring input prep to allow out-of-tree models (#6497 )	2025-08-12 20:29:10 -04:00