TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-09 04:31:49 +08:00

History

Faraz 49c45ebef1 [None][fix] change logging for weight loading on unified memory (#9177 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>		2025-11-19 14:31:19 -05:00
..
_tensorrt_engine
_torch	[None][fix] change logging for weight loading on unified memory (#9177 )	2025-11-19 14:31:19 -05:00
bench	[#9237 ][feat] enable iter stats in autodeploy (#9278 )	2025-11-19 19:29:29 +01:00
commands	[None][chore] local imports for AutoDeploy in serve and bench (#9199 )	2025-11-18 08:14:32 +08:00
evaluate	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 )	2025-11-11 07:48:23 -08:00
executor	[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765 )	2025-11-13 17:21:24 -08:00
inputs	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 )	2025-11-11 07:48:23 -08:00
layers
llmapi	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 )	2025-11-18 20:59:17 -05:00
metrics	[None][feat] Add `trtllm_` prefix for exposed metrics (#8845 )	2025-11-06 15:27:18 +08:00
models	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
plugin	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 )	2025-10-17 16:13:22 -04:00
quantization	[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701 )	2025-10-29 12:45:09 +08:00
runtime	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
scaffolding	[None][feat] Deep Research Implemented with Scaffolding (#8452 )	2025-11-06 10:33:28 +08:00
serve	[None][chore] Support json_schema in response_format (#8934 )	2025-11-14 09:43:13 +08:00
tools	[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065 )	2025-11-14 10:03:00 +08:00
__init__.py	[None] [fix] Disable UCC as WAR to MPI allgather issue before NGC PyTorch 25.12 upgrade (#9126 )	2025-11-13 02:25:30 -08:00
_common.py	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 )	2025-09-25 21:02:35 +08:00
_dlpack_utils.py
_ipc_utils.py	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 )	2025-10-04 08:12:24 +08:00
_mnnvl_utils.py
_ray_utils.py	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 )	2025-11-04 10:19:24 -08:00
_utils.py	[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765 )	2025-11-13 17:21:24 -08:00
builder.py	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
disaggregated_params.py	[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577 )	2025-09-22 19:07:18 -07:00
functional.py	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 )	2025-09-25 21:02:35 +08:00
graph_rewriting.py
logger.py	[None][chore] Mass integration of release/1.0 - 3rd (#7519 )	2025-09-08 14:03:04 +08:00
lora_helper.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
lora_manager.py	[https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063 )	2025-10-12 12:29:52 -07:00
mapping.py	[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003 )	2025-11-13 10:34:17 +08:00
math_utils.py
module.py	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 )	2025-09-25 21:02:35 +08:00
network.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
parameter.py
profiler.py
prompt_adapter_manager.py
python_plugin.py
ray_stub.py	[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175 )	2025-10-14 23:46:30 +08:00
sampling_params.py	[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127 )	2025-10-27 13:12:31 -04:00
scheduling_params.py
serialization.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
top_model_mixin.py	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 )	2025-10-17 16:13:22 -04:00
version.py	[None][chore] Bump version to 1.2.0rc3 (#9004 )	2025-11-07 01:24:32 -08:00