TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Thor Johnsen 5d438be59a [TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 ) * v1.5 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> v1.5.4 Add back draft_overhead to spec dec stats Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v1.5.5: fix CI error Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.6: fix CI error 8196 > 8192 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * precommit run Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v2.0: Address reviewer concerns Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v2.1: add fix from wili Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Revert changes that require use of TypeAlias because that requires python version >= 3.10 Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> --------- Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>		2025-05-21 10:40:00 +08:00
..
__init__.py	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
build_cache.py	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
disagg_utils.py	feat: conditional disaggregation in disagg server (#3974 )	2025-05-21 09:57:46 +08:00
llm_args.py	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
llm_utils.py	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
llm.py	[https://nvbugspro.nvidia.com/bug/5243740 ][fix] deduce default max_tokens for trtllm-serve (#4265 )	2025-05-19 00:34:40 +08:00
mgmn_leader_node.py	fix: llmapi-launch add add trtllm-bench test with engine building (#4091 )	2025-05-21 10:18:01 +08:00
mgmn_worker_node.py	Update TensorRT-LLM (#2333 )	2024-10-15 15:28:40 +08:00
mpi_session.py	fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428 )	2025-05-20 20:16:14 +08:00
reasoning_parser.py	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 )	2025-05-06 08:13:04 +08:00
tokenizer.py	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 )	2025-04-22 07:38:16 +08:00
tracer.py	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
trtllm-llmapi-launch	fix hmac in remote mpi session (#3649 )	2025-04-18 17:47:51 +08:00
utils.py	fix: trtllm-bench build trt engine on slurm (#3825 )	2025-04-27 22:26:23 +08:00