TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

William Zhang a6a88985cf [TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-12-22 06:32:49 -05:00
..
_tensorrt_engine	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
_torch	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
bench	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
commands	[TRTLLM-9654][feat] Support DeepSeek-V32 chat template (#9814 )	2025-12-19 17:05:38 +08:00
evaluate	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 )	2025-12-21 02:52:42 -05:00
executor	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
inputs	[TRTLLM-9654][feat] Support DeepSeek-V32 chat template (#9814 )	2025-12-19 17:05:38 +08:00
layers	[#9236 ][feature] Make sharing of activation_type across SW layers more robust (#9238 )	2025-11-20 16:06:58 +08:00
llmapi	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
metrics	[None][feat] Add `trtllm_` prefix for exposed metrics (#8845 )	2025-11-06 15:27:18 +08:00
models	[#2730 ][fix] Fix circular import bug in medusa/weight.py (#9866 )	2025-12-11 13:51:08 +08:00
plugin	[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (#9538 )	2025-11-28 16:45:23 +08:00
quantization	[https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 (#9687 )	2025-12-18 22:57:20 +08:00
runtime	[#6425 ][fix] address CUDA stream sync issue in ModelRunnerCPP (#6426 )	2025-12-12 13:33:22 +08:00
scaffolding	[None][feat] Deep Research Implemented with Scaffolding (#8452 )	2025-11-06 10:33:28 +08:00
serve	[TRTLLM-9604][feat] DS R1 & V3.1 tool parser (#10010 )	2025-12-19 17:20:03 +08:00
tokenizer	[https://nvbugs/5684820 ][fix] fix the detokenizer issue for DeepSeek-v3.2 (#10106 )	2025-12-22 10:56:33 +08:00
tools	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 )	2025-12-15 20:05:20 -08:00
__init__.py	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 )	2025-12-11 09:33:25 -08:00
_common.py	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 )	2025-09-25 21:02:35 +08:00
_dlpack_utils.py	linting(python): Enable ruff on more files (wave 1/N) (#5140 )	2025-06-14 19:19:34 +08:00
_ipc_utils.py	[None][chore] Modify python ipc_util to align with C++ path (#9894 )	2025-12-12 15:55:22 +08:00
_mnnvl_utils.py	[https://nvbugs/5477730 ][fix] Fix the alltoall case when tp_size larger than ep_size (#7331 )	2025-09-04 08:10:03 -04:00
_ray_utils.py	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 )	2025-11-04 10:19:24 -08:00
_utils.py	[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726 )	2025-12-16 05:16:32 -08:00
builder.py	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
disaggregated_params.py	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
functional.py	[#8921 ][feat] Added symetric memory AllReduce strategy (#8919 )	2025-12-08 13:12:56 -08:00
graph_rewriting.py	linting(python): Enable ruff on more files (wave 1/N) (#5140 )	2025-06-14 19:19:34 +08:00
logger.py	[None][chore] Mass integration of release/1.0 - 3rd (#7519 )	2025-09-08 14:03:04 +08:00
lora_helper.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
lora_manager.py	[https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063 )	2025-10-12 12:29:52 -07:00
mapping.py	[None][feat] Async pp send for PPCommTorch. (#9976 )	2025-12-15 14:03:46 +08:00
math_utils.py	perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318 )	2025-06-26 14:03:56 +08:00
module.py	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 )	2025-09-25 21:02:35 +08:00
network.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
parameter.py	fix:https://nvbugs/5234033 enable starcoder trt-flow with transforme… (#3909 )	2025-05-15 11:16:45 +08:00
profiler.py	linting(python): Enable ruff on more files (wave 1/N) (#5140 )	2025-06-14 19:19:34 +08:00
prompt_adapter_manager.py	linting(python): Enable ruff on more files (wave 1/N) (#5140 )	2025-06-14 19:19:34 +08:00
python_plugin.py	linting(python): Enable ruff on more files (wave 1/N) (#5140 )	2025-06-14 19:19:34 +08:00
ray_stub.py	[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175 )	2025-10-14 23:46:30 +08:00
sampling_params.py	[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002 )	2025-12-15 08:52:52 -08:00
scheduling_params.py	[None][feat] Add support of scheduling attention dp request (#6246 )	2025-08-01 20:38:01 -04:00
serialization.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
top_model_mixin.py	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 )	2025-10-17 16:13:22 -04:00
version.py	[None][chore] bump version to 1.2.0rc6 (#9874 )	2025-12-10 04:53:26 -08:00