| .. |
|
utils
|
feat: Allow individual gatherContext for each additional output (#3374)
|
2025-04-12 17:00:36 +08:00 |
|
allocateKvCache.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
assignReqSeqSlots.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
cacheFormatter.cpp
|
chore: disable some env for disagg defaultly (#3415)
|
2025-04-14 10:08:10 +08:00 |
|
cacheFormatter.h
|
feat: Add BW measurement (#3070)
|
2025-03-28 10:53:00 +08:00 |
|
cacheTransceiver.cpp
|
chore: Ucx ip port remove mpi depend (#3101)
|
2025-04-02 09:42:29 +08:00 |
|
capacityScheduler.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
CMakeLists.txt
|
chore: Ucx ip port remove mpi depend (#3101)
|
2025-04-02 09:42:29 +08:00 |
|
contextProgress.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
createNewDecoderRequests.cpp
|
Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338)
|
2025-04-08 23:51:27 +08:00 |
|
dataTransceiver.cpp
|
chore: disable some env for disagg defaultly (#3415)
|
2025-04-14 10:08:10 +08:00 |
|
dataTransceiver.h
|
feat: Add BW measurement (#3070)
|
2025-03-28 10:53:00 +08:00 |
|
dataTransceiverImpl.cpp
|
chore: disable some env for disagg defaultly (#3415)
|
2025-04-14 10:08:10 +08:00 |
|
dataTransceiverImpl.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
decoderBuffers.cpp
|
chore: Clean up cpp runtime (#3505)
|
2025-04-14 18:00:03 +08:00 |
|
encoderBuffers.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
encoderBuffers.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
evictionPolicy.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
generateRequestOptions.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
guidedDecoder.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
handleContextLogits.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
handleGenerationLogits.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
kvCacheEventManager.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
kvCacheManager.cpp
|
fix: disable KV cache reuse if using attention sink (#3021)
|
2025-04-16 03:07:32 +08:00 |
|
kvCacheTransferManager.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
llmRequest.cpp
|
feat: Allow individual gatherContext for each additional output (#3374)
|
2025-04-12 17:00:36 +08:00 |
|
logitsPostProcessor.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
loraBuffers.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
loraBuffers.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
makeDecodingBatchInputOutput.cpp
|
refactor: batch slot management in decoder classes (#3300)
|
2025-04-13 05:05:13 +08:00 |
|
medusaBuffers.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
microBatchScheduler.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
mlaCacheFormatter.cpp
|
chore: disable some env for disagg defaultly (#3415)
|
2025-04-14 10:08:10 +08:00 |
|
mlaCacheFormatter.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
pauseRequests.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
peftCacheManager.cpp
|
feat: Support PeftCacheManager in Torch (#3186)
|
2025-04-04 12:38:08 +08:00 |
|
promptTuningBuffers.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
promptTuningBuffers.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
rnnStateBuffers.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
rnnStateBuffers.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
rnnStateManager.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
runtimeBuffers.cpp
|
chore: Clean up cpp runtime (#3505)
|
2025-04-14 18:00:03 +08:00 |
|
sequenceSlotManager.cpp
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
transformerBuffers.cpp
|
Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338)
|
2025-04-08 23:51:27 +08:00 |
|
trtEncoderModel.cpp
|
chore: Clean up cpp runtime (#3537)
|
2025-04-15 16:06:14 +08:00 |
|
trtEncoderModel.h
|
chore: Clean up cpp runtime (#3537)
|
2025-04-15 16:06:14 +08:00 |
|
trtGptModel.h
|
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983)
|
2025-03-24 22:49:52 +08:00 |
|
trtGptModelFactory.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
trtGptModelInflightBatching.cpp
|
chore: Clean up cpp runtime (#3537)
|
2025-04-15 16:06:14 +08:00 |
|
trtGptModelInflightBatching.h
|
feat: Allow individual gatherContext for each additional output (#3374)
|
2025-04-12 17:00:36 +08:00 |
|
trtGptModelV1.cpp
|
Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338)
|
2025-04-08 23:51:27 +08:00 |
|
trtGptModelV1.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |