TensorRT-LLMs/tensorrt_llm/scaffolding/README.md
WeiHaocheng 7ac04ada2a
doc: Add README.md for scaffolding (#3048)
* Add README.md for scaffolding

Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>

* Update tensorrt_llm/scaffolding/README.md

Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>

---------

Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
2025-03-25 13:58:01 +08:00

39 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Scaffolding
## Introduction
Scaffolding is a framework to run various inference time compute methods such as CoT, majority vote, best of N, MCTS, etc.
Scaffolding is built around three key principles:
- Ease of Use: Users can easily run inference time compute methods and enginners can easily customize the framework to add new methods and execution backends.
- Modularity: Scaffolding is designed to be modular. Engineers can fully reuse existing modules when defining new methods.
- Performance: Scaffolding is designed to be performant. It considers the design of concurrent schedule and provides more information to the backend to help optimize performance.
## Architecture
There are following key components in Scaffolding:
- `Controller`: The class that defines the workflow of inference time compute methods.
- `Worker`: The class that handles the operations such as generation and reward scoring.
- `ScaffoldingLlm`: The interface class to run inference time compute methods.
Workflow of Scaffolding:
1. Users instantiate a `ScaffoldingLlm` instance by assembling a `Controller` and some `Worker`s.
2. Users call `ScaffoldingLlm`'s API to run inference time compute methods.
3. `ScaffoldingLlm` instantiate a `Controller` instance and get `Task` from `Controller`.
4. `ScaffoldingLlm` dispatch the `Task` to `Worker` and return the completed `Task` back to `Controller`.
5. `Controller` create new `Task` until the inference time compute method is finished.
6. `ScaffoldingLlm` return the result to users.
## Usage
See [example/scaffolding](example/scaffolding).
## Future Work
- support openai api workeron the way
- support reward model on the way
- performance benchmark on the way
- support best of N
- support MCTS
- support sandbox