TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Frank 1e317c98c6 [feat]: Allow for a settable end-of-sequence/padding token in max throughput benchmark. (#3776 ) * Move world options to a different group for clarity. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Add eos_id option. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> --------- Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>		2025-05-01 09:42:46 +08:00
..
utils	Add smart router for moe (#3641 )	2025-04-23 12:21:59 +08:00
__init__.py	Update TensorRT-LLM (#2389 )	2024-10-29 22:24:38 +08:00
low_latency.py	chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025 )	2025-04-05 13:31:48 +08:00
throughput.py	[feat]: Allow for a settable end-of-sequence/padding token in max throughput benchmark. (#3776 )	2025-05-01 09:42:46 +08:00