Commit Graph

12 Commits

Author SHA1 Message Date
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Izzy Putterman
1ad7bc4c78
[None][feat] Draft: Save state first pass (#7012)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-01 18:40:55 -04:00
Ziyi Xiong
536e8776cd
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 07:33:44 +08:00
Zheyu Fu
c353ff342e
[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-10 12:53:59 +08:00
Mike Iovine
90145cf557
[None][feat] Optimize CUDA graph memory usage for spec decode cases (#6718)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-08 13:56:53 -04:00
Mike Iovine
e968f98b43
[None][feat] Clean up ngram auto mode, add max_concurrency to configs (#6676)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-07 12:51:47 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
Ziyi Xiong
58d22a72f1
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter (#6007)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
2025-07-17 21:15:01 +08:00
Mike Iovine
fa34cb7234
[refactor] Clean up drafter/resource manager creation logic (#5805)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-16 12:45:46 -07:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
wili
56cdfe5c6c
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-06-27 23:00:17 +08:00