Commit Graph

8 Commits

Author SHA1 Message Date
Netanel Haber
9cd8148f28
API Breaking Change + Readability: "decoder"->"sampler" (#4121)
* *decoder*->*sampler*; new_tensors_device: dict[str, torch.Tensor] -> device: SampleStateTensors

* **Breaking Change**, as it changes public interfaces, main changes:
* PyTorchConfig [consumed via LLM(pytorch_backend_config)]: Configuration parameters mixed_decoder and enable_trtllm_decoder -> sampler.
* Command-line argument --enable_trtllm_decoder becomes --enable_trtllm_sampler in examples/pytorch/quickstart_advanced.py.

---------

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-05-16 23:52:25 +08:00
Mike Iovine
8c2c969fcb
[fix] Pad requests to maximum draft length in spec decode (#3957)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-04-30 11:02:18 -04:00
Fanrong Li
e6b482ef47
fix: change the seq_lens sync copy to an async one (#3786)
---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-29 23:56:49 +08:00
Perkz Zheng
35c5e4f1c5
feat: add CGA reduction fmha kernels on Blackwell. (#3763)
* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add trtllm-gen kernels for eagle3 and also kernels with cga-reduction

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* address the comments

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-04-29 10:43:54 +08:00
Yuan Tong
d4c0423cdb
refactor: collect executor and decoder states into dataclass (#3234)
* fix: Proper error bubbling for PyExecutor

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-15 16:31:45 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation (#3035)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00