TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
Wanli Jiang	11da7e3605	[None][fix] Solve pillow version conflict (#10537 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-12 04:05:54 -05:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00
Venky	aa1fe931de	[None][docs] Add `--config` preference over `--extra_llm_api_options` in CODING_GUIDELINES.md (#10426 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-05 22:05:47 -05:00
Fanrong Li	4931c5eb3a	[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 16:43:42 +08:00
bhsueh_NV	db3430f589	[None][feat] Support VLM part for Mistral Large 3 (#10188 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-25 11:20:58 -05:00
Gabriel Wu	1d01214ff0	[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>	2025-12-25 14:52:52 +08:00
bhsueh_NV	cd4b4f43fa	[None][feat] Support Eagle3 on Mistral Large3 (#9971 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-21 10:25:45 -05:00
Venky	dfa11d810e	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
William Zhang	28b02b4f5a	[None][docs] Add README for Nemotron Nano v3 (#10017 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-15 22:17:24 -08:00
bhsueh_NV	e49c70f6df	[None][feat] Support Mistral Large3 LLM part (#9820 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-13 11:44:27 +08:00
Venky	fd1270b9ab	[TRTC-43] [feat] Add config db and docs (#9420 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-12-12 04:00:03 +08:00
Frank	f6df9eb2a6	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
Chenjie Luo	d252101a76	[OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer (#9679 ) Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>	2025-12-07 07:14:05 -08:00
Iman Tabrizian	356a52edf5	[None][feat] Add support for KVCache reuse for DSv32 (#9383 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 11:14:30 +08:00
dominicshanshan	6345074686	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-11-29 21:48:48 +08:00
Wanli Jiang	d100599ea7	[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (#9246 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-26 11:12:35 +08:00
jiahanc	255e4ea9f0	[None][doc] Update DS-R1 example doc (#9231 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-18 21:10:02 -08:00
Fanrong Li	25bd2e6917	[None][doc] Add DeepSeek-V3.2-Exp document (#9141 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-13 22:01:58 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
jiahanc	de6088e363	[None][doc] update llama and llama4 example doc (#9048 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 22:04:26 -08:00
JadoTu	6bbb43f2b9	[None][feat] Add qwen3-next nvfp4 support (#8526 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-11-06 09:45:44 +08:00
Anish Shanbhag	6a6317727b	[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-11-03 17:42:41 -08:00
Robin Kobus	1b3ad7259d	[None][feat] Use ruff for formatting and linting new files by default (#8629 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-01 16:11:40 +01:00
Anish Shanbhag	a09b38a862	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-28 09:17:26 -07:00
Anish Shanbhag	15de45d782	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-22 20:53:08 -04:00
mpikulski	93a4b7f1b6	[None][chore] update torch_dtype -> dtype in 'transformers' (#8263 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-15 17:09:30 +09:00
Yechan Kim	e9e4632e44	[None][doc] Add more description on EXAONE usage (#8089 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-30 21:32:43 -04:00
bhsueh_NV	38d6e4e60b	[None][feat] Support Qwen3 next (#7892 ) Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-29 21:16:07 +08:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	9f0f52249e	[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yueh-Ting (eop) Chen	cf100933cc	[TRTLLM-6341][feature] Support SWA KV cache reuse (#6768 ) This merge request attempts to support more SWA KV cache functionality inside the KV cache manager. Before this merge request, the KV cache for sliding window attention (SWA) only holds "window size" number of blocks and reuse them in a cyclic manner. We will not be able to utilize more GPU memory with this design, leading to a limited max batch size throughput. Additionally, we will not be able to support KV cache reuse with this design. In this MR, we change such behavior to let the manager write blocks in a linear manner. With a linear block writing behavior, as the attention window moves on, the out-of-window (OOW) blocks will be detached. Right now for the sake of a correct feature first, we directly offload the OOW block from the primary block pool (GPU memory) to the secondary block pool (host memory). We will improve this in the future by delegating the block movement to the eviction policy. KV cache reuse for SWA is not developed in this merge request and will be amended in a follow-up merge request. Writing the blocks linearly, the maximum number of blocks allocated for a sequence(`GenerationRequest`) is the "max sequence length" specified. The `GenerationRequest` that stores the cache block bookkeeping structure will now keep "max sequence length" tokens of blocks. Given the above, main changes are (more context in the MR): - Remove "cyclic" concept under the kv cache manager, such concept originally guards the block reuse under kv cache manager. - Add detach mechanism and have it under `KVCacheManager::addToken`. Please note that detach is still guarded off for SWA when reuse is enabled. A follow-up merge request will proceed to improve this. - Enforce "max sequence length" to be a non-optional parameter to the `KVCacheManager`/`BlockManager` - Let all window size resource pool get identical proportion of memory - Fix free memory calculation under `resource_manager.py` Signed-off-by: eopXD <yuehtingc@nvidia.com> Co-authored-by: Tomer Asida <tasida@nvidia.com>	2025-09-24 14:28:24 +08:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Kyungmin Lee	6fcc0540f0	[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py (#2382 ) Signed-off-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2025-09-18 21:54:26 -07:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
Kanghwan	f58a183c6e	[None][chore] Fix formatting error in Gemma3 readme (#7352 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-03 01:15:37 +08:00
jiahanc	9f2dc3069d	[None] [doc] Update DeepSeek example doc (#7358 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-01 14:43:58 -04:00
brb-nv	0253036a4e	[None][chore] Add docs for Gemma3 VLMs (#6880 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
jmydurant	8e252256f5	[None][doc] Modify the description for mla chunked context (#6929 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-15 12:52:26 +08:00
hlu1	5346eb7bc5	[None][doc] Update gpt-oss doc on MoE support matrix (#6908 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-15 08:50:31 +08:00
Chang Liu	be9dd4713c	[https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-11 17:16:49 -07:00
Liao Lanyu	a2e9153cb0	[None][doc] Add K2 tool calling examples (#6667 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-08-11 16:25:41 +08:00
Yibin Li	97787883c3	[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix (#6493 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-08-08 21:40:48 -04:00
Guoming Zhang	0223de0727	[None][doc] Add deployment guide section for VDR task (#6669 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-07 10:30:47 -04:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
Guoming Zhang	f7f46a5017	doc: remove the outdated features which marked as Experimental (#5995 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-06 22:01:42 -04:00
Yibin Li	2a946859a7	[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-08-06 14:21:03 -07:00

1 2 3

123 Commits