TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-26 05:32:57 +08:00

History

2ez4bz 7ebb770dce [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-08-14 02:15:44 -04:00
..
auto_deploy	[AutoDeploy] merge feat/ad-2025-07-22 (#6520 )	2025-08-01 08:51:08 -07:00
compilation	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
debugger	Fix: fix nvbug 5356427 (#5464 )	2025-06-25 22:24:26 +08:00
modeling	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-08-14 02:15:44 -04:00
modules	[None][feat] Add support for fused gate_up_proj scales for FP8 blockwise (#6496 )	2025-08-05 11:22:32 -07:00
multi_gpu	[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237 )	2025-07-25 08:01:40 +08:00
multi_gpu_modeling	[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752 )	2025-07-16 16:42:59 +08:00
multimodal	[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263 )	2025-07-30 10:00:15 -04:00
speculative	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 )	2025-07-31 15:31:39 -04:00
thop	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 )	2025-08-06 16:44:21 +08:00
helpers.py	Deepseek R1 FP8 Support on Blackwell (#6486 )	2025-08-01 10:26:28 +08:00
pattern_watcher.py	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
test_attention_mla.py	fix mla test (#5240 )	2025-06-17 15:26:25 +08:00
test_attention_no_cache.py	refactor(test): remove random context sequence lengths and set seed for reproducibility in attention tests (#3919 )	2025-04-29 10:08:04 +08:00
test_attention.py	reduce num layers in attention test (#3509 )	2025-04-14 12:43:59 +08:00
test_autotuner.py	feat: Enhance AutoTuner inference path and code readability (#4466 )	2025-06-04 10:53:11 +08:00
test_beam_search.py	[TRTLLM-5061] chore: add status tags to LLM API reference (#5707 )	2025-07-28 15:57:07 +08:00
test_best_of_n.py	[TRTLLM-5271][feat] best_of/n for pytorch workflow (#5997 )	2025-08-04 14:08:06 +02:00
test_custom_ops.py	[TRTLLM-4279] fix: Add a protection test for checking trtllm custom ops (#6515 )	2025-08-01 15:59:09 +08:00
test_executor_request_queue.py	[TRTLLM-5271][feat] best_of/n for pytorch workflow (#5997 )	2025-08-04 14:08:06 +02:00
test_flashinfer_attention.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
test_flashinfer_star_attn.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
test_fp8_per_tensor_scale_tllmg_gemm.py	fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849 )	2025-07-22 12:48:00 +08:00
test_group_rmn_norm.py	feat: Add heuristic for GroupRMSNorm kernel selection. (#4047 )	2025-05-13 08:52:53 +08:00
test_mnnvl_memory.py	feat: Add MNNVL MoE A2A support (#3504 )	2025-04-25 17:29:08 +08:00
test_overlap_scheduler_input.json	refactor: Unify request order in TRT and PyTorch workflow (#4096 )	2025-05-20 18:49:27 +02:00
test_overlap_scheduler.py	[ci] parallelize torch unittests (#5714 )	2025-07-09 11:05:57 +03:00
test_pytorch_model_engine.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
test_resource_manager.py	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6786 )	2025-08-11 14:31:39 -04:00
test_return_logits.py	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
test_share_tensor.py	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
test_trtllm_sampler.py	[fix] Add detokenization-based stop word logic to LLM API (#5948 )	2025-07-29 10:16:59 -07:00
test_vanilla_attention.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
test_virtual_memory.py	[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 )	2025-08-04 13:51:01 +08:00