TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

2ez4bz cf0c47ca2d [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>		2025-09-01 11:02:31 +08:00
..
test_modeling_bert.py	feat: no-cache attention in PyTorch workflow (#3085 )	2025-04-05 01:54:32 +08:00
test_modeling_clip.py	feat: add Pytorch support of Vision Encoder for multimodal models (#3791 )	2025-05-03 05:13:47 +08:00
test_modeling_deepseek.py	[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752 )	2025-07-16 16:42:59 +08:00
test_modeling_exaone4.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_gemma3.py	[None][chore] Update Gemma3 closeness check to mitigate flakiness (#6591 )	2025-08-04 10:10:58 -04:00
test_modeling_gpt_oss.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
test_modeling_llama_min_latency.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_llama.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_mistral.py	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-09-01 11:02:31 +08:00
test_modeling_mixtral.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_mllama.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_nemotron_h.py	[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334 )	2025-08-22 12:15:20 -04:00
test_modeling_nemotron_nas.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
test_modeling_nemotron.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_out_of_tree.py	chores: merge examples for v1.0 doc (#5736 )	2025-07-08 21:00:42 -07:00
test_modeling_phi3.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_pixtral.py	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 )	2025-08-08 01:50:36 -04:00
test_modeling_qwen_moe.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_qwen.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_modeling_siglip.py	feat: Update Gemma3 Vision Encoder (#5973 )	2025-07-14 22:38:10 +08:00
test_modeling_vila.py	feat: llama4 input processor (#3383 )	2025-04-25 16:47:14 -07:00