TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
William Zhang	a6a88985cf	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-22 06:32:49 -05:00
Chang Liu	998857bcde	[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-22 19:07:18 -07:00
rakib-hasan	7ab8112450	[None][fix] Refactoring to avoid circular import when importing torch models (#6720 ) Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-08-11 18:00:42 -04:00
Pengyun Lin	69e9f6d489	[fix]: Skip prompt length checking for generation only requests (#6146 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-19 21:26:37 +08:00
2ez4bz	dc52b67492	linting(python): Enable ruff on more files (wave 1/N) (#5140 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-06-14 19:19:34 +08:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
Kaiyu Xie	ab5b19e027	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00

7 Commits