TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-09 12:41:52 +08:00

Author	SHA1	Message	Date
Mike Iovine	77be1b7572	[https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test (#10387 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-06 11:46:34 -05:00
Izzy Putterman	bdf6953ddc	[None][feat] Eagle: MLA Based Eagle (#9677 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-02 13:45:07 -05:00
Ziyi Xiong	d8b5aeb061	[https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens (#10160 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-25 09:13:51 -05:00
Aurelien Chartier	7175d89b48	[None][fix] Fix iteration stats for spec-dec (#9855 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-12-16 14:11:38 -08:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Stefan Niebler	f155812eb0	[TRTLLM-6756][feat] Add Beam Search to TorchSampler (#8509 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-01 18:48:04 +01:00
Zheyu Fu	dbbed1f85a	[None][ci] Waive blackwell test on spec gate. (#9502 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-11-27 07:19:58 +08:00
YueWeng	cc336c4abd	[TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-11-25 09:40:55 -05:00
Ziyi Xiong	7c4344b92e	[https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-18 15:41:56 -05:00
Zheyu Fu	c4e02d7f04	[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-11-18 11:13:39 -05:00
Ziyi Xiong	a7aaf50541	[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding (#8706 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-13 10:20:16 -05:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
DylanChen-NV	b275635a9a	[https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-11-06 07:41:21 -08:00
kris1025	e2c5a38879	[https://nvbugs/5534574 ][fix] disable spec decoding forever once the request spec decoding is disabled (#8446 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-10-29 19:28:43 +08:00
Mike Iovine	00161b315f	[https://nvbugs/5549111 ][fix] Fix 2-model overlap scheduler accuracy on very long prompts (#8076 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Michael Iovine <miovine@nvidia.com>	2025-10-28 14:55:34 -07:00
YueWeng	8dc4aac5b6	[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-10-21 11:11:04 -04:00
mpikulski	87eb5086fb	[None][fix] restore list[list[list[int]]] in add_token (#8502 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 22:34:57 -04:00
mpikulski	97ce0ecefe	[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 11:15:41 +02:00
sunnyqgg	dd61454d5f	[https://nvbugs/5461761 ][fix] Unwaive eagle3 test (#8363 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-16 09:51:48 -04:00
Zheyu Fu	bac665e650	[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. (#7283 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-10-13 15:51:14 -07:00
kris1025	a7ea544dbe	[TRTLLM-7384][feat] enable rejection sampling for CDL (#7731 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-10-12 20:38:48 +08:00
Izzy Putterman	f2657c1ae9	[None][fix] Eagle: Attention DP (#7939 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-10-06 16:52:35 -04:00
Ziyi Xiong	7bc2d9e993	[https://nvbugs/5537878 ][fix] Reserve an extra slot for padded batch (#7998 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-10-03 08:42:52 -07:00
Erin	ba3dbb6c94	[https://nvbugs/5548098 ][fix] Fix flakey unit test for dynamic spec d… (#8129 )	2025-10-02 22:58:37 -07:00
Izzy Putterman	1ad7bc4c78	[None][feat] Draft: Save state first pass (#7012 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-10-01 18:40:55 -04:00
YueWeng	a4243f0da5	[TRTLLM-6393][feat] add static tree sampling and verification (#7161 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-09-26 13:16:16 -04:00
sunnyqgg	2e5850c28a	[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference (#7363 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-26 11:28:05 +08:00
Zheyu Fu	34963ec39c	[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off (#7511 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-23 06:54:18 -07:00
Ziyi Xiong	897c4dd23b	[https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec (#7728 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-21 08:20:48 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
QI JUN	d3e680b3c3	[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] (#7788 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 15:12:55 +08:00
Ziyi Xiong	536e8776cd	[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 07:33:44 +08:00
Zheyu Fu	c353ff342e	[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-10 12:53:59 +08:00
Mike Iovine	45390402fc	[https://nvbugs/5502352 ][fix] Fix 2-model CDL path (#7543 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-06 23:53:27 -04:00
QI JUN	b8183cac2b	[None][ci] Revert "[https://nvbugs/5461761 ][fix] Remove the waiver (#7476 )" (#7584 )	2025-09-05 22:02:09 -07:00
Ziyi Xiong	79e0296ca0	[https://nvbugs/5461761 ][fix] Remove the waiver (#7476 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-05 15:29:54 +08:00
Izzy Putterman	26b133f3a7	[None][feat] MultiLayer Eagle (#7234 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-04 10:49:13 -04:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00
Izzy Putterman	b36460d7b5	[None][feat] Deepseek: Start Eagle work (#6210 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-08-22 12:57:17 -04:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
Izzy Putterman	ef53de8eef	[None][feat] Add test for speculative rejection sampler (2-model) (#6542 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-08-13 22:09:35 -04:00
Mike Iovine	f68e03e646	[https://nvbugs/5452167 ][fix] Fix ngram padding issue (#6837 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-13 11:23:16 +08:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
Mike Iovine	0f2f11f90b	[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-24 21:50:11 -04:00
wili	8ecdeee300	[refactor] Simplification of Speculative decoding configs - Part 2 (#5936 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-23 09:20:27 +08:00
Ziyi Xiong	66030ef815	[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133 ) Signed-off-by: ziyixiong-nv <fxiong@nvidia.com> Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-19 13:17:15 +08:00
Zhenhuan Chen	30608a5e6d	[https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error (#5865 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-14 17:17:30 +08:00
wili	3dfc819849	[BUG5374319][fix] WAR for draft-target-model unit tests error (#5958 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-12 23:48:57 +09:00

1 2

79 Commits