TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Enwei Zhu 5ff3a65b23 [TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>		2025-09-03 15:16:11 -07:00
..
auto_deploy	[#7136 ][feat] trtllm-serve + autodeploy integration (#7141 )	2025-08-22 08:30:53 -07:00
features	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 )	2025-09-03 15:16:11 -07:00
adding_new_model.md	chores: merge examples for v1.0 doc (#5736 )	2025-07-08 21:00:42 -07:00
arch_overview.md	update broken link of PyTorchModelEngine in arch_overview (#6171 )	2025-07-18 19:53:38 +08:00
attention.md	chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603 )	2025-05-28 18:43:04 +08:00
kv_cache_manager.md	Release 0.20 to main (#4577 )	2025-05-28 16:25:33 +08:00
scheduler.md	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00