TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

Author	SHA1	Message	Date
Mike Iovine	9085021aa4	[None][feat] Implement sampling for MTP 1-model (#10019 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-31 13:48:34 -05:00
Aurelien Chartier	041bb32151	[None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE (#9608 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-12-04 08:23:57 -08:00
Stefan Niebler	f155812eb0	[TRTLLM-6756][feat] Add Beam Search to TorchSampler (#8509 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-01 18:48:04 +01:00
Gaoji Liu	9d2df04a72	[None][doc] fix mtp.py typo (#9307 ) Signed-off-by: liugaoji <757394026@qq.com>	2025-11-30 21:55:13 -08:00
Aurelien Chartier	ef7ee6a940	[None][feat] Add environment variable to force spec-dec number of accepted tokens (#9371 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-11-26 07:22:16 -08:00
Tri Dao	fc088e642c	[None][feat] Support Glm4MoeForCausalLM (#8256 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com> Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 09:43:21 +08:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
mpikulski	87eb5086fb	[None][fix] restore list[list[list[int]]] in add_token (#8502 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 22:34:57 -04:00
mpikulski	97ce0ecefe	[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 11:15:41 +02:00
Jin Li	d594c2d0ff	[https://nvbugs/5537348 ][fix] Use device tensor index for MTP (#8062 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
YueWeng	a4243f0da5	[TRTLLM-6393][feat] add static tree sampling and verification (#7161 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-09-26 13:16:16 -04:00
Daniel Cámpora	9f1d9b7b18	[None][feat] Use list instead of torch tensor for new tokens in update requests (#7730 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-09-23 10:40:08 -04:00
Enwei Zhu	59f57598a7	[https://nvbugs/5504086 ][fix] Fix MTP vanilla (#7904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 08:38:28 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
Netanel Haber	a5cfc8368f	[https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async (#7041 ) (#7796 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-09-17 21:27:01 -04:00
Kaiyu Xie	62042a9733	[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128 ) (#7571 ) Signed-off-by: Cheng Hang <chang@nvidia.com> Co-authored-by: Cheng Hang <chang@nvidia.com>	2025-09-17 09:41:32 +08:00
Jin Li	d49374bc45	[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-09 12:18:56 -04:00
Netanel Haber	0fee8cd028	[TRTLLM-7153] [feat] Move stop_criteria to sample_async (#7041 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-09-07 17:36:49 +03:00
Enwei Zhu	1745102e72	[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 23:30:14 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
YueWeng	9a4f60687f	[https://nvbugs/5480289 ][fix] release slot manager in mtp MTPHiddenStatesManager (#7340 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-09-02 19:37:51 -07:00
Daniel Cámpora	099f081e03	[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-22 08:09:30 +02:00
kris1025	4aed7a7d19	[TRTLLM-6853][feat] refactor deepseekv3 model (#6698 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-08-14 11:03:17 -04:00
liji-nv	dcbfa7e509	[https://nvbugs/5252313 ][fix] Fix torch compile + MTP (#6554 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-05 10:31:29 -04:00
YueWeng	2dd3186727	fix: remove cudaStreamSynchronize when using relaxed acceptance (#5262 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-07-28 09:18:41 +08:00
ameynaik-hub	1e5e71aa42	Mtp optimizations round1 (#5689 ) Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com> Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>	2025-07-25 13:48:27 -04:00
Netanel Haber	d9a3530048	[nvbug/5393888][nvbug/5393042] Always use `py_seq_slot` (#6147 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-07-18 22:45:16 +03:00
wili	2e3cf42e03	[refactor] Simplification of Speculative decoding configs (#5639 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-10 11:37:30 -04:00
Netanel Haber	aa72d39b72	MTP and derivatives: Align sample state with trtllm sampler sample state (#5675 ) This PR moves MTPSampler and derivatives to use the universal seq_slot indexing for sampling. This is the last piece of the puzzle: After this, all of the samplers will use this format. See: `6ee94c7` Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-07-03 19:55:48 +02:00
Jhao-Ting Chen	77082cde38	[https://nvbugspro.nvidia.com/bug/5329655 ] [feat] Pytorch path add spec dec param to attention op (#5146 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-07-02 04:54:43 -04:00
liji-nv	c345f5876c	[feat] Support torch compile for attention dp (#5086 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-01 13:48:52 -04:00
Netanel Haber	6ee94c7ac8	Reintroduce with perf fixes: feature: unify new_tokens format sample state to trtllm samper tokens format (#5513 ) `58a8a8f` - these changes were previously merged to main here. `6aef149` - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue). This PR is meant to re-merge these changes along with a fix to prevent the regression. The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes. Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-06-30 11:58:59 -07:00
Fanrong Li	6cbc9a5297	[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-30 15:59:12 +08:00
Netanel Haber	6aef14943c	Revert "feature: unify new_tokens format sample state to trtllm samper new_tokens format (#4401 )" (#5474 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-06-25 20:56:04 -07:00
Netanel Haber	58a8a8fd37	feature: unify new_tokens format sample state to trtllm sampler new_tokens format (#4401 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-06-23 10:38:37 -07:00
Fanrong Li	5d4ab47d5b	fix: refactor and fix mtp vanilla (#4762 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-20 05:23:39 +08:00
Fanrong Li	c7af650d5a	Fix: fix the deterministic issue in the MTP Eagle path (#5285 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 18:08:40 +08:00
Daniel Cámpora	d68b8180d3	feat: port MakeDecodingBatchInputOutput to python in TRTLLMSampler (#4828 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-10 07:28:34 +08:00
Bo Li	f414a079ad	chore: Change the type annotations of input_ids and position_ids to int32. (#4632 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-06-07 16:10:47 +08:00
Fanrong Li	380a5d1690	[https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue (#4536 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 10:03:34 +08:00
Yilin Fan	31bb650298	Cherry pick feat/llama4 to main (#4739 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com> Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-05-30 05:28:40 +08:00
Yuxian Qiu	8f055f5d14	feat: Skip sampler for intermediate pp stages. (#4514 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-26 10:08:51 +08:00
liji-nv	58e405624a	[https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 (#3952 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-05-19 22:12:25 +08:00
Netanel Haber	9cd8148f28	API Breaking Change + Readability: "decoder"->"sampler" (#4121 ) * decoder->sampler; new_tensors_device: dict[str, torch.Tensor] -> device: SampleStateTensors * Breaking Change, as it changes public interfaces, main changes: * PyTorchConfig [consumed via LLM(pytorch_backend_config)]: Configuration parameters mixed_decoder and enable_trtllm_decoder -> sampler. * Command-line argument --enable_trtllm_decoder becomes --enable_trtllm_sampler in examples/pytorch/quickstart_advanced.py. --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-05-16 23:52:25 +08:00
yuxianq	4f8afe4cc6	feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-16 04:16:53 +08:00
Fanrong Li	77f8e43592	[fix] Fix relaxed acceptance to support enabling it in context phase (#4126 ) * fix relaxed acceptance to support enable this feature in context phase. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix sample_and_accept_draft_tokens unit test. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-09 14:11:14 +08:00
YueWeng	b1621e8d4e	feat: add relaxed acceptance for DS (#3865 ) * add relaxed acceptance for DS R1 Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * clean and update docs Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * Modified based on review Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix mtp manager issue Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> --------- Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-01 21:50:36 +08:00
Yuan Tong	57944206ba	feat: return logits in PyTorch flow (#3221 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-24 16:56:03 -07:00
Fanrong Li	bc1c4ddcb5	fix: remove the unnecessary metadata changes in mtp. (#3787 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-23 16:01:28 +08:00

1 2

60 Commits