Mike Iovine
77be1b7572
[ https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test ( #10387 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-06 11:46:34 -05:00
Izzy Putterman
bdf6953ddc
[None][feat] Eagle: MLA Based Eagle ( #9677 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-01-02 13:45:07 -05:00
Ziyi Xiong
d8b5aeb061
[ https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens ( #10160 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-25 09:13:51 -05:00
Aurelien Chartier
7175d89b48
[None][fix] Fix iteration stats for spec-dec ( #9855 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-16 14:11:38 -08:00
Mike Iovine
07c76a5fac
[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) ( #8810 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-09 11:06:31 -05:00
Stefan Niebler
f155812eb0
[TRTLLM-6756][feat] Add Beam Search to TorchSampler ( #8509 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-12-01 18:48:04 +01:00
Zheyu Fu
dbbed1f85a
[None][ci] Waive blackwell test on spec gate. ( #9502 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-27 07:19:58 +08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Ziyi Xiong
7c4344b92e
[ https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv ( #9210 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). ( #8194 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Ziyi Xiong
a7aaf50541
[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding ( #8706 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-13 10:20:16 -05:00
Stefan Niebler
326a201473
[ https://nvbugs/5508536 ][fix] Take Over ( #8627 ): Reintroduce: Move stop_criteria to sample_async ( #7041 ) ( #8794 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
DylanChen-NV
b275635a9a
[ https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill ( #8910 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-11-06 07:41:21 -08:00
kris1025
e2c5a38879
[ https://nvbugs/5534574 ][fix] disable spec decoding forever once the request spec decoding is disabled ( #8446 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-29 19:28:43 +08:00
Mike Iovine
00161b315f
[ https://nvbugs/5549111 ][fix] Fix 2-model overlap scheduler accuracy on very long prompts ( #8076 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Michael Iovine <miovine@nvidia.com>
2025-10-28 14:55:34 -07:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token ( #8502 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements ( #8398 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
sunnyqgg
dd61454d5f
[ https://nvbugs/5461761 ][fix] Unwaive eagle3 test ( #8363 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-16 09:51:48 -04:00
Zheyu Fu
bac665e650
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. ( #7283 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-10-13 15:51:14 -07:00
kris1025
a7ea544dbe
[TRTLLM-7384][feat] enable rejection sampling for CDL ( #7731 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-12 20:38:48 +08:00
Izzy Putterman
f2657c1ae9
[None][fix] Eagle: Attention DP ( #7939 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-06 16:52:35 -04:00
Ziyi Xiong
7bc2d9e993
[ https://nvbugs/5537878 ][fix] Reserve an extra slot for padded batch ( #7998 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-03 08:42:52 -07:00
Erin
ba3dbb6c94
[ https://nvbugs/5548098 ][fix] Fix flakey unit test for dynamic spec d… ( #8129 )
2025-10-02 22:58:37 -07:00
Izzy Putterman
1ad7bc4c78
[None][feat] Draft: Save state first pass ( #7012 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-01 18:40:55 -04:00
YueWeng
a4243f0da5
[TRTLLM-6393][feat] add static tree sampling and verification ( #7161 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-09-26 13:16:16 -04:00
sunnyqgg
2e5850c28a
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference ( #7363 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-26 11:28:05 +08:00
Zheyu Fu
34963ec39c
[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off ( #7511 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-23 06:54:18 -07:00
Ziyi Xiong
897c4dd23b
[ https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec ( #7728 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-21 08:20:48 +08:00
Ziyi Xiong
420f0fbcf5
[ https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda ( #7790 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-19 08:11:29 +08:00
QI JUN
d3e680b3c3
[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] ( #7788 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 15:12:55 +08:00
Ziyi Xiong
536e8776cd
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding ( #7651 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 07:33:44 +08:00
Zheyu Fu
c353ff342e
[None][feat] Make the should_use_spec_decode logic a bit smarter ( #7112 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-10 12:53:59 +08:00
Mike Iovine
45390402fc
[ https://nvbugs/5502352 ][fix] Fix 2-model CDL path ( #7543 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-06 23:53:27 -04:00
QI JUN
b8183cac2b
[None][ci] Revert "[ https://nvbugs/5461761 ][fix] Remove the waiver ( #7476 )" ( #7584 )
2025-09-05 22:02:09 -07:00
Ziyi Xiong
79e0296ca0
[ https://nvbugs/5461761 ][fix] Remove the waiver ( #7476 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-05 15:29:54 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle ( #7234 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Emma Qiao
09bca7ca82
[None][infra] Waive failed tests for release branch 0818 ( #6993 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config ( #7254 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Izzy Putterman
b36460d7b5
[None][feat] Deepseek: Start Eagle work ( #6210 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-08-22 12:57:17 -04:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Izzy Putterman
ef53de8eef
[None][feat] Add test for speculative rejection sampler (2-model) ( #6542 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-13 22:09:35 -04:00
Mike Iovine
f68e03e646
[ https://nvbugs/5452167 ][fix] Fix ngram padding issue ( #6837 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 11:23:16 +08:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically ( #6363 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
Mike Iovine
0f2f11f90b
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model ( #6104 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-24 21:50:11 -04:00
wili
8ecdeee300
[refactor] Simplification of Speculative decoding configs - Part 2 ( #5936 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-23 09:20:27 +08:00
Ziyi Xiong
66030ef815
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support ( #6133 )
...
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-19 13:17:15 +08:00
Zhenhuan Chen
30608a5e6d
[ https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error ( #5865 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-14 17:17:30 +08:00
wili
3dfc819849
[BUG5374319][fix] WAR for draft-target-model unit tests error ( #5958 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-12 23:48:57 +09:00