TensorRT-LLMs/tensorrt_llm/_torch
Yi Zhang 0306c0f12c
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-02 14:29:02 +08:00
..
attention_backend [TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659) 2026-02-02 14:29:02 +08:00
auto_deploy [#8242][feat] Add int4 GPTQ support for AutoDeploy (#8248) 2026-01-30 23:07:24 -08:00
compilation [TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531) 2026-01-05 15:44:37 +08:00
configs
custom_ops [TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791) 2026-01-31 13:48:25 +08:00
cute_dsl_kernels [TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987) 2026-01-27 16:15:32 +08:00
debug
disaggregation [TRTLLM-9527][feat] Python transceiver components (step 2) (#10494) 2026-01-22 10:14:50 -08:00
distributed [TRTLLM-10264][feat] Support attention DP + Helix CP (#10477) 2026-01-29 02:57:13 -05:00
models [None][feat] Perfect routing for Deepseek models (#11127) 2026-01-30 23:46:35 -05:00
modules [TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791) 2026-01-31 13:48:25 +08:00
peft [https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279) 2026-01-22 14:01:18 +01:00
pyexecutor [TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659) 2026-02-02 14:29:02 +08:00
shared_tensor
speculative [TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459) 2026-01-29 11:06:09 -05:00
__init__.py
async_llm.py [TRTLLM-9736][feat] AsyncLLM and verl integ (#9353) 2025-12-11 09:33:25 -08:00
autotuner.py [TRTLLM-10264][feat] Support attention DP + Helix CP (#10477) 2026-01-29 02:57:13 -05:00
cublaslt_utils.py
cute_dsl_utils.py
device_mesh.py [TRTLLM-9465][fix] Swap TP-CP grouping order (#10350) 2026-01-05 20:08:03 +08:00
expert_statistic.py
flashinfer_utils.py [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
hostfunc.py
llm.py
memory_buffer_utils.py [https://nvbugs/5811697][fix] Fix buffer reuse. (#10716) 2026-01-25 18:12:21 +08:00
metadata.py
model_config.py [TRTLLM-9771][feat] Allow overriding quantization configs (#11062) 2026-01-31 10:48:51 -05:00
utils.py [TRTLLM-9771][feat] Support partial update weight for fp8 (#10456) 2026-01-22 14:46:05 +08:00
virtual_memory.py [TRTLLM-9736][feat] AsyncLLM and verl integ (#9353) 2025-12-11 09:33:25 -08:00