TensorRT-LLMs

Chuang Zhu 1c88af1378 feat: use cudaMalloc to allocate kvCache (#3303 )	2025-04-08 10:59:14 +08:00
..
utils	chore: Add output of first token to additional generation outputs (#3205 )	2025-04-02 20:14:16 +08:00
allocateKvCache.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
assignReqSeqSlots.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
cacheFormatter.cpp	feat: Add BW measurement (#3070 )	2025-03-28 10:53:00 +08:00
cacheFormatter.h	feat: Add BW measurement (#3070 )	2025-03-28 10:53:00 +08:00
cacheTransceiver.cpp	chore: Ucx ip port remove mpi depend (#3101 )	2025-04-02 09:42:29 +08:00
capacityScheduler.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
CMakeLists.txt	chore: Ucx ip port remove mpi depend (#3101 )	2025-04-02 09:42:29 +08:00
contextProgress.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
createNewDecoderRequests.cpp	chore: remove usernames from comments (#3291 )	2025-04-05 13:44:28 +08:00
dataTransceiver.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
dataTransceiver.h	feat: Add BW measurement (#3070 )	2025-03-28 10:53:00 +08:00
dataTransceiverImpl.cpp	chore: Ucx ip port remove mpi depend (#3101 )	2025-04-02 09:42:29 +08:00
dataTransceiverImpl.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
decoderBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
encoderBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
encoderBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
evictionPolicy.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
generateRequestOptions.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
guidedDecoder.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
handleContextLogits.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
handleGenerationLogits.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
kvCacheEventManager.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
kvCacheManager.cpp	feat: use cudaMalloc to allocate kvCache (#3303 )	2025-04-08 10:59:14 +08:00
kvCacheTransferManager.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
llmRequest.cpp	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
logitsPostProcessor.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
loraBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
loraBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
makeDecodingBatchInputOutput.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
medusaBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
microBatchScheduler.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
mlaCacheFormatter.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
mlaCacheFormatter.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
pauseRequests.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
peftCacheManager.cpp	feat: Support PeftCacheManager in Torch (#3186 )	2025-04-04 12:38:08 +08:00
promptTuningBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
promptTuningBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
rnnStateBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
rnnStateBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
rnnStateManager.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
runtimeBuffers.cpp	chore: remove usernames from comments (#3291 )	2025-04-05 13:44:28 +08:00
sequenceSlotManager.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
transformerBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
trtEncoderModel.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
trtEncoderModel.h	Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183 ) (#3195 )	2025-04-04 15:56:28 +02:00
trtGptModel.h	fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983 )	2025-03-24 22:49:52 +08:00
trtGptModelFactory.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
trtGptModelInflightBatching.cpp	chore: remove usernames from comments (#3291 )	2025-04-05 13:44:28 +08:00
trtGptModelInflightBatching.h	Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183 ) (#3195 )	2025-04-04 15:56:28 +02:00
trtGptModelV1.cpp	v1.2 (#3082 )	2025-03-26 23:31:29 +08:00
trtGptModelV1.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00

utils

chore: Add output of first token to additional generation outputs (#3205 )

2025-04-02 20:14:16 +08:00

allocateKvCache.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

assignReqSeqSlots.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

cacheFormatter.cpp

feat: Add BW measurement (#3070 )

2025-03-28 10:53:00 +08:00

cacheFormatter.h

feat: Add BW measurement (#3070 )

2025-03-28 10:53:00 +08:00

cacheTransceiver.cpp

chore: Ucx ip port remove mpi depend (#3101 )

2025-04-02 09:42:29 +08:00

capacityScheduler.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

CMakeLists.txt

chore: Ucx ip port remove mpi depend (#3101 )

2025-04-02 09:42:29 +08:00

contextProgress.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

createNewDecoderRequests.cpp

chore: remove usernames from comments (#3291 )

2025-04-05 13:44:28 +08:00

dataTransceiver.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

dataTransceiver.h

feat: Add BW measurement (#3070 )

2025-03-28 10:53:00 +08:00

dataTransceiverImpl.cpp

chore: Ucx ip port remove mpi depend (#3101 )

2025-04-02 09:42:29 +08:00

dataTransceiverImpl.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

decoderBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

encoderBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

encoderBuffers.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

evictionPolicy.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

generateRequestOptions.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

guidedDecoder.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

handleContextLogits.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

handleGenerationLogits.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

kvCacheEventManager.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

kvCacheManager.cpp

feat: use cudaMalloc to allocate kvCache (#3303 )

2025-04-08 10:59:14 +08:00

kvCacheTransferManager.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

llmRequest.cpp

Update TensorRT-LLM (#2936 )

2025-03-18 21:25:19 +08:00

logitsPostProcessor.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

loraBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

loraBuffers.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

makeDecodingBatchInputOutput.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

medusaBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

microBatchScheduler.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

mlaCacheFormatter.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

mlaCacheFormatter.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

pauseRequests.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

peftCacheManager.cpp

feat: Support PeftCacheManager in Torch (#3186 )

2025-04-04 12:38:08 +08:00

promptTuningBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

promptTuningBuffers.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

rnnStateBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

rnnStateBuffers.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

rnnStateManager.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

runtimeBuffers.cpp

chore: remove usernames from comments (#3291 )

2025-04-05 13:44:28 +08:00

sequenceSlotManager.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

transformerBuffers.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

trtEncoderModel.cpp

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

trtEncoderModel.h

Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183 ) (#3195 )

2025-04-04 15:56:28 +02:00

trtGptModel.h

fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983 )

2025-03-24 22:49:52 +08:00

trtGptModelFactory.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00

trtGptModelInflightBatching.cpp

chore: remove usernames from comments (#3291 )

2025-04-05 13:44:28 +08:00

trtGptModelInflightBatching.h

Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183 ) (#3195 )

2025-04-04 15:56:28 +02:00

trtGptModelV1.cpp

v1.2 (#3082 )

2025-03-26 23:31:29 +08:00

trtGptModelV1.h

Update TensorRT-LLM (#2873 )

2025-03-11 21:13:42 +08:00