TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 16:51:11 +08:00

Author	SHA1	Message	Date
Yilin Fan	31bb650298	Cherry pick feat/llama4 to main (#4739 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com> Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-05-30 05:28:40 +08:00
Netanel Haber	9cd8148f28	API Breaking Change + Readability: "decoder"->"sampler" (#4121 ) * decoder->sampler; new_tensors_device: dict[str, torch.Tensor] -> device: SampleStateTensors * Breaking Change, as it changes public interfaces, main changes: * PyTorchConfig [consumed via LLM(pytorch_backend_config)]: Configuration parameters mixed_decoder and enable_trtllm_decoder -> sampler. * Command-line argument --enable_trtllm_decoder becomes --enable_trtllm_sampler in examples/pytorch/quickstart_advanced.py. --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-05-16 23:52:25 +08:00
Mike Iovine	e534bf09cc	[fix] Fix flashinfer + speculation issues (#3686 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-04-28 14:34:22 -04:00
Mike Iovine	41a6c98544	Support CUDA graphs for EAGLE3 (#3176 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-04-17 04:53:50 +08:00
Yuan Tong	d4c0423cdb	refactor: collect executor and decoder states into dataclass (#3234 ) * fix: Proper error bubbling for PyExecutor Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-15 16:31:45 +08:00
tburt-nv	7a659885e3	chore: remove usernames from comments (#3291 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-05 13:44:28 +08:00
Mike Iovine	5416966ddb	Add initial EAGLE-3 implementation (#3035 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-03-29 22:31:24 +08:00

7 Commits