TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Shunkangz ff4047414b [None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>		2025-08-27 11:16:12 +08:00
..
test_executor_request_queue.py	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 )	2025-08-27 11:16:12 +08:00
test_overlap_scheduler_input.json	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
test_overlap_scheduler.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
test_pytorch_model_engine.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
test_resource_manager.py	fix/improve kvcache allocation in PyTorch runtime (#5933 )	2025-08-26 12:40:22 +08:00