TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

History

Jin Li ef268e2062 [TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>		2026-01-30 01:49:17 -05:00
..
__init__.py
_util.py
config_utils.py
cuda_graph_runner.py
executor_request_queue.py	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 )	2026-01-29 02:57:13 -05:00
finish_reason.py
grammar_matcher.py
guided_decoder.py
handle_additional_outputs.py
handle_logits.py
hang_detector.py
kv_cache_connector.py
kv_cache_transceiver.py
layerwise_nvtx_marker.py
llm_request.py
make_decoding_batch_input_output.py
mamba_cache_manager.py
model_engine.py	[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029 )	2026-01-30 01:49:17 -05:00
model_loader.py
py_executor_creator.py	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 )	2026-01-29 14:01:51 +08:00
py_executor.py	[None][chore] Consolidate duplicate kv cache reuse variables. (#10935 )	2026-01-29 11:03:27 -08:00
resource_manager.py
sampler.py	[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459 )	2026-01-29 11:06:09 -05:00
sampling_utils_flashinfer.py
sampling_utils.py
scheduler.py
seq_slot_manager.py