TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Robin Kobus 7b2818a47b refactor: CreateNewDecoderRequests (#4452 ) * refactor: CreateNewDecoderRequests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Consolidate request generation in CreateNewDecoderRequests - Removed the GenerateRequestOptions class and integrated its functionality into CreateNewDecoderRequests. - Updated the constructor of CreateNewDecoderRequests to accept parameters for speculative decoding and normalization options. - Modified the operator() method to handle request generation directly, improving code organization and reducing redundancy. - Cleaned up associated includes and references throughout the codebase. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Simplify request handling in CreateNewDecoderRequests - Removed the generateRequestOptions method and integrated its logic directly into the operator() method. - Updated the request generation process to improve clarity and reduce redundancy. - Adjusted the return type to streamline the handling of batch slots, decoder requests, and sampling configurations. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Enhance createDecoderRequests method in CreateNewDecoderRequests - Updated the createDecoderRequests method to include additional parameters for decoder state and CUDA streams, improving flexibility in request handling. - Removed redundant request generation logic from the operator() method, streamlining the process. - Adjusted the newRequest method to utilize the updated decoder request structure, enhancing clarity and maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Use MedusaBuffers instead of RuntimeBuffers in CreateNewDecoderRequests - Updated references from RuntimeBuffers to MedusaBuffers across the CreateNewDecoderRequests class and its methods, enhancing clarity in buffer management. - Adjusted method signatures and internal logic to accommodate the new MedusaBuffers type, ensuring compatibility with existing functionality. - Cleaned up unnecessary includes and improved code organization for better maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update CreateNewDecoderRequests to use DecoderState and CudaStream parameters - Modified method signatures in CreateNewDecoderRequests to replace GptDecoderBatched with runtime::decoder::DecoderState and added a separate CudaStream for the decoder. - Adjusted the implementation of the operator() method to accommodate the new parameters, enhancing flexibility in request handling. - Updated associated bindings in the pybind11 interface to reflect the changes in method signatures, ensuring consistency across the codebase. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update TRTLLMSampler to use refactored create_new_decoder_requests - Updated the sampler.py to reflect changes in the request handling logic, replacing generate_request_options with create_new_decoder_requests for improved clarity and consistency. - Updated bindings and method signatures for decoder stream handling. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update gptDecoderBatchedTest to use CreateNewDecoderRequests::newRequest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>		2025-05-23 22:54:37 +08:00
..
algorithms.cpp	refactor: CreateNewDecoderRequests (#4452 )	2025-05-23 22:54:37 +08:00
algorithms.h	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
bindings.cpp	[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243 )	2025-05-22 22:01:06 -04:00
bindings.h	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
buffers.cpp	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
buffers.h	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
cacheTransceiver.cpp	Agent interface impl for NIXL (#4125 )	2025-05-22 09:09:41 +08:00
cacheTransceiver.h	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00
kvCacheManager.cpp	cacheTransceiver buffer manager (#3798 )	2025-04-27 11:48:15 +08:00
kvCacheManager.h	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
llmRequest.cpp	feat: Add multimodal embedding field in LlmRequest (#3855 )	2025-05-01 12:23:30 +08:00
llmRequest.h	feat: Add multimodal embedding field in LlmRequest (#3855 )	2025-05-01 12:23:30 +08:00