mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

WeiHaocheng c6081abb0e feat: Make scaffolding Controller more generic #3408 (#3416 ) Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>		2025-04-12 21:35:38 +08:00
..
__init__.py	feat: Make scaffolding Controller more generic #3408 (#3416 )	2025-04-12 21:35:38 +08:00
controller.py	feat: Make scaffolding Controller more generic #3408 (#3416 )	2025-04-12 21:35:38 +08:00
math_utils.py	feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3312 )	2025-04-09 21:13:47 +08:00
README.md	doc: Add README.md for scaffolding (#3048 )	2025-03-25 13:58:01 +08:00
scaffolding_llm.py	feat: Make scaffolding Controller more generic #3408 (#3416 )	2025-04-12 21:35:38 +08:00
task.py	feat: refactor scaffolding worker and support openai api worker (#3166 )	2025-04-01 18:31:52 +08:00
worker.py	feat: Make scaffolding Controller more generic #3408 (#3416 )	2025-04-12 21:35:38 +08:00

Scaffolding

Introduction

Scaffolding is a framework to run various inference time compute methods such as CoT, majority vote, best of N, MCTS, etc.

Scaffolding is built around three key principles:

Ease of Use: Users can easily run inference time compute methods and enginners can easily customize the framework to add new methods and execution backends.
Modularity: Scaffolding is designed to be modular. Engineers can fully reuse existing modules when defining new methods.
Performance: Scaffolding is designed to be performant. It considers the design of concurrent schedule and provides more information to the backend to help optimize performance.

There are following key components in Scaffolding:

Controller: The class that defines the workflow of inference time compute methods.
Worker: The class that handles the operations such as generation and reward scoring.
ScaffoldingLlm: The interface class to run inference time compute methods.

Workflow of Scaffolding:

Users instantiate a ScaffoldingLlm instance by assembling a Controller and some Workers.
Users call ScaffoldingLlm's API to run inference time compute methods.
ScaffoldingLlm instantiate a Controller instance and get Task from Controller.
ScaffoldingLlm dispatch the Task to Worker and return the completed Task back to Controller.
Controller create new Task until the inference time compute method is finished.
ScaffoldingLlm return the result to users.