TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 03:35:00 +08:00

Author	SHA1	Message	Date
Netanel Haber	9cd8148f28	API Breaking Change + Readability: "decoder"->"sampler" (#4121 ) * decoder->sampler; new_tensors_device: dict[str, torch.Tensor] -> device: SampleStateTensors * Breaking Change, as it changes public interfaces, main changes: * PyTorchConfig [consumed via LLM(pytorch_backend_config)]: Configuration parameters mixed_decoder and enable_trtllm_decoder -> sampler. * Command-line argument --enable_trtllm_decoder becomes --enable_trtllm_sampler in examples/pytorch/quickstart_advanced.py. --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-05-16 23:52:25 +08:00
Mike Iovine	8c2c969fcb	[fix] Pad requests to maximum draft length in spec decode (#3957 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-04-30 11:02:18 -04:00
Fanrong Li	e6b482ef47	fix: change the seq_lens sync copy to an async one (#3786 ) --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-29 23:56:49 +08:00
Perkz Zheng	35c5e4f1c5	feat: add CGA reduction fmha kernels on Blackwell. (#3763 ) * update cubins Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * add trtllm-gen kernels for eagle3 and also kernels with cga-reduction Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * address the comments Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-04-29 10:43:54 +08:00
Yuan Tong	d4c0423cdb	refactor: collect executor and decoder states into dataclass (#3234 ) * fix: Proper error bubbling for PyExecutor Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-15 16:31:45 +08:00
Mike Iovine	5416966ddb	Add initial EAGLE-3 implementation (#3035 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-03-29 22:31:24 +08:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
Kaiyu Xie	77d7fe1eb2	Update TensorRT-LLM (#2849 ) * Update TensorRT-LLM --------- Co-authored-by: aotman <chenhangatm@gmail.com>	2025-03-04 18:44:00 +08:00

8 Commits