TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Thor Johnsen 5d438be59a [TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 ) * v1.5 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> v1.5.4 Add back draft_overhead to spec dec stats Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v1.5.5: fix CI error Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.6: fix CI error 8196 > 8192 Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * Address reviewer concerns Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * precommit run Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> * v2.0: Address reviewer concerns Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v2.1: add fix from wili Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * Revert changes that require use of TypeAlias because that requires python version >= 3.10 Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> --------- Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>		2025-05-21 10:40:00 +08:00
..
cache_transmission	feat: NIXL interface integration (#3934 )	2025-05-19 18:18:22 +08:00
cacheTransceiverConfig.cpp	[TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625 )	2025-05-06 15:06:46 +02:00
CMakeLists.txt	feat: NIXL interface integration (#3934 )	2025-05-19 18:18:22 +08:00
contextPhaseParams.cpp	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
debugConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
decodingConfig.cpp	Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338 )	2025-04-08 23:51:27 +08:00
disaggServerUtil.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
dynamicBatchConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
dynamicBatchTuner.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
dynamicBatchTuner.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
executor.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
executorConfig.cpp	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 )	2025-05-12 22:32:29 +02:00
executorImpl.cpp	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 )	2025-05-14 23:10:04 +02:00
executorImpl.h	refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893 )	2025-05-04 13:24:29 +02:00
executorKVCacheEventManager.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
extendedRuntimePerfKnobConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
guidedDecodingConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
guidedDecodingParams.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
intervalSet.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
jsonSerialization.cpp	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
kvCacheConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
kvCacheRetentionConfig.cpp	chore: Clean up cpp runtime (#3537 )	2025-04-15 16:06:14 +08:00
logitsPostProcessorConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
loraConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
model.h	fix: request termination in pipeline parallelism (#3892 )	2025-05-05 21:51:41 +08:00
mropeConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
orchestratorConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
orchestratorUtils.h	refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893 )	2025-05-04 13:24:29 +02:00
outputConfig.cpp	[TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625 )	2025-05-06 15:06:46 +02:00
parallelConfig.cpp	feat: Add numNodes to ParallelConfig (#3346 )	2025-04-13 13:55:04 +02:00
peftCacheConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
promptTuningConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
request.cpp	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
requestImpl.h	feat: Add multimodal embedding field in LlmRequest (#3855 )	2025-05-01 12:23:30 +08:00
requestUtils.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
requestUtils.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
requestWithId.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
requestWithId.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
response.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
responseImpl.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
samplingConfig.cpp	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 )	2025-05-12 22:32:29 +02:00
schedulerConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
serialization.cpp	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
serializeUtils.h	[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936 )	2025-05-21 10:40:00 +08:00
speculativeDecodingConfig.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
tensor.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
types.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00