TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-08 04:01:51 +08:00

History

Chang Liu 47e37755a3 [TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>		2025-09-14 20:10:10 -07:00
..
attention	[https://nvbugs/5453806 ][unwaive] Unwaive fp8 kvcache attention test (#7243 )	2025-09-05 12:13:57 -04:00
auto_deploy	[#5861 ][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227 )	2025-09-10 13:00:06 +08:00
compilation
debugger
executor
misc
modeling	[None][ci] remove unnecessary test_modeling_deepseek.py (#7542 )	2025-09-04 20:05:27 -07:00
models/checkpoints/hf
modules	[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277 )	2025-09-09 12:18:56 -04:00
multi_gpu	[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629 )	2025-09-09 11:06:32 -04:00
multi_gpu_modeling	[None][chore] Mass integration of release/1.0 - 3rd (#7519 )	2025-09-08 14:03:04 +08:00
multimodal	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 )	2025-09-14 20:10:10 -07:00
sampler
speculative	[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112 )	2025-09-10 12:53:59 +08:00
thop	[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809 )	2025-09-04 09:03:38 -07:00
helpers.py	[None][chore] share input_ids buffers among different cuda graphs (#7236 )	2025-09-06 17:49:42 -04:00
pattern_watcher.py
test_connector.py
test_torch_sampler.py	[TRTLLM-7153] [feat] Move stop_criteria to sample_async (#7041 )	2025-09-07 17:36:49 +03:00