TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Xiwen Yu	291290851a	Merge remote-tracking branch 'origin/main' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-07 10:28:24 +08:00
Chang Liu	23500b55c3	[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-09-06 17:58:32 -04:00
QI JUN	12ecb864c2	[None][chore] share input_ids buffers among different cuda graphs (#7236 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-06 17:49:42 -04:00
Xiwen Yu	5e7aa76bb4	Merge branch 'user/sm103_trtllmgen' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-06 00:49:23 +08:00
Leslie Fang	9eb3911470	[None][chore] Remove executor_config in create_py_executor_instance (#7463 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-05 20:56:03 +08:00
Xiwen Yu	2c3f4cbeee	Merge remote-tracking branch 'origin/main' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-05 15:53:43 +08:00
Shunkangz	bddf183e15	[None][feat] Add Request specific exception (#6931 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-09-04 18:43:42 -04:00
Enwei Zhu	1745102e72	[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 23:30:14 +08:00
Izzy Putterman	26b133f3a7	[None][feat] MultiLayer Eagle (#7234 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-04 10:49:13 -04:00
jianweiwu	7090b286b2	[None][fix] fix hunyuan_moe init bug (#7502 ) Signed-off-by: sorenwu <sorenwu@tencent.com>	2025-09-04 03:06:00 -04:00
Leslie Fang	bd9ba97d89	[None][chore] Remove two unused parameters in create_py_executor (#7458 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-04 07:31:31 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Anurag Mukkara	ae5136831f	[https://nvbugs/5472947 ][fix] wait on isend handles before reusing buffers (#7462 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-09-03 13:20:02 +05:30
YueWeng	9a4f60687f	[https://nvbugs/5480289 ][fix] release slot manager in mtp MTPHiddenStatesManager (#7340 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-09-02 19:37:51 -07:00
Jinyang Yuan	572551b586	[None][perf] Autotune TRT-LLM Gen MoE when using CUDA graphs (#7285 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-03 10:08:59 +08:00
Leslie Fang	42697ea32a	[None][chore] rm executor config in kv cache connector (#7372 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-03 08:13:13 +08:00
Xiwen Yu	62a78973a8	Merge remote-tracking branch 'origin/main' into user/xiweny/merge_0901 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-02 10:12:30 +08:00
Leslie Fang	e81c50dbd2	[None][chore] Use llm args in create_py_executor (#7239 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-01 16:27:55 -07:00
Mike Iovine	b3c57a7042	[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-01 14:37:44 -04:00
Xiwen Yu	38ef850552	Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_0901 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-01 11:46:44 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
Fanrong Li	37a1bd810f	[https://nvbugs/5481385 ][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm (#7345 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:00:43 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
Mike Iovine	8b216135f0	[None][refactor] Move draft token padding out of Drafter (#7134 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-27 11:07:50 +02:00
Shunkangz	ff4047414b	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-27 11:16:12 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00
qixiang-99	b165f8bc97	fix/improve kvcache allocation in PyTorch runtime (#5933 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-08-26 12:40:22 +08:00
Daniel Cámpora	e8e7e52892	[None][chore] Refactored the handle logits pp communication (#7154 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-25 16:14:08 -04:00
QI JUN	bea5e07fb7	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-25 20:52:05 +08:00
amitz-nv	a1e03af0f4	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-25 10:37:40 +03:00
Xiwen Yu	808059da34	Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-23 16:13:30 +08:00
Xiwen Yu	f4de8840ec	Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-23 15:17:48 +08:00
Izzy Putterman	b36460d7b5	[None][feat] Deepseek: Start Eagle work (#6210 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-08-22 12:57:17 -04:00
Daniel Cámpora	099f081e03	[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-22 08:09:30 +02:00
Wanli Jiang	07c711eb1f	[TRTLLM-6825][fix] Update lora for phi4-mm (#6817 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-21 22:00:04 -04:00
Chang Liu	75b8a90816	[None][fix] Fix llama4 multimodal by skipping request validation (#6957 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-20 21:58:53 -04:00
Chang Liu	ce53832610	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-19 21:42:50 -07:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
Xiwen Yu	8b532363ce	Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-19 17:02:34 +08:00
Shunkangz	54ec2c1af1	[None][opt] Add batch wait timeout in fetching requests (#6923 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-19 03:50:08 -04:00
Yi Zhang	a15af879ec	[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-08-19 09:58:44 +08:00
Kaiyu Xie	e88cb92f24	[None] [feat] Support accurate device iter time (#6906 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-18 13:47:14 +08:00
Izzy Putterman	f6ff0e3311	[None][fix] Skip Topk if 0 (#6934 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-08-16 02:17:36 -04:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
yifeizhang-c	4127d77678	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-15 09:52:06 -07:00
Xiwen Yu	0bf6a18627	Fix and waive to clean L0 Signed-off-by: Xiwen Yu <xiweny@nvidia.com>	2025-08-15 04:37:43 -07:00
tomeras91	f7dbc1435a	[None] [chore] Mamba cache in separate file (#6796 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-08-15 13:42:51 +03:00
qianbiao	5c2f0fd03d	[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521 ) Signed-off-by: sorenwu <sorenwu@tencent.com> Co-authored-by: sorenwu <sorenwu@tencent.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-08-15 06:56:44 +08:00
Matthias Jouanneaux	69574ad730	[TRTLLM-5966][feat] Helix: extend mapping to support different CP types (#6816 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-08-14 09:00:02 -07:00
jmydurant	4200fa46d1	[None][feat] Add support for Hopper MLA chunked prefill (#6655 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-14 10:39:26 +08:00

1 2 3 4 5 ...

399 Commits