mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| controller.py | ||
| math_utils.py | ||
| README.md | ||
| scaffolding_llm.py | ||
| task.py | ||
| worker.py | ||
Scaffolding
Introduction
Scaffolding is a framework to run various inference time compute methods such as CoT, majority vote, best of N, MCTS, etc.
Scaffolding is built around three key principles:
- Ease of Use: Users can easily run inference time compute methods and enginners can easily customize the framework to add new methods and execution backends.
- Modularity: Scaffolding is designed to be modular. Engineers can fully reuse existing modules when defining new methods.
- Performance: Scaffolding is designed to be performant. It considers the design of concurrent schedule and provides more information to the backend to help optimize performance.
Architecture
There are following key components in Scaffolding:
Controller: The class that defines the workflow of inference time compute methods.Worker: The class that handles the operations such as generation and reward scoring.ScaffoldingLlm: The interface class to run inference time compute methods.
Workflow of Scaffolding:
- Users instantiate a
ScaffoldingLlminstance by assembling aControllerand someWorkers. - Users call
ScaffoldingLlm's API to run inference time compute methods. ScaffoldingLlminstantiate aControllerinstance and getTaskfromController.ScaffoldingLlmdispatch theTasktoWorkerand return the completedTaskback toController.Controllercreate newTaskuntil the inference time compute method is finished.ScaffoldingLlmreturn the result to users.
Usage
See example/scaffolding.
Future Work
- support openai api worker(on the way)
- support reward model (on the way)
- performance benchmark (on the way)
- support best of N
- support MCTS
- support sandbox