TensorRT-LLMs/tensorrt_llm/scaffolding
Zhenhuan Chen ad15e45f07
[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-04-28 17:15:33 +08:00
..
contrib feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731) 2025-04-24 18:47:03 +08:00
__init__.py feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731) 2025-04-24 18:47:03 +08:00
controller.py [TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807) 2025-04-28 17:15:33 +08:00
math_utils.py feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3312) 2025-04-09 21:13:47 +08:00
README.md doc: Add README.md for scaffolding (#3048) 2025-03-25 13:58:01 +08:00
scaffolding_llm.py feat: Make scaffolding Controller more generic #3408 (#3416) 2025-04-12 21:35:38 +08:00
task.py [TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807) 2025-04-28 17:15:33 +08:00
worker.py [TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807) 2025-04-28 17:15:33 +08:00

Scaffolding

Introduction

Scaffolding is a framework to run various inference time compute methods such as CoT, majority vote, best of N, MCTS, etc.

Scaffolding is built around three key principles:

  • Ease of Use: Users can easily run inference time compute methods and enginners can easily customize the framework to add new methods and execution backends.
  • Modularity: Scaffolding is designed to be modular. Engineers can fully reuse existing modules when defining new methods.
  • Performance: Scaffolding is designed to be performant. It considers the design of concurrent schedule and provides more information to the backend to help optimize performance.

Architecture

There are following key components in Scaffolding:

  • Controller: The class that defines the workflow of inference time compute methods.
  • Worker: The class that handles the operations such as generation and reward scoring.
  • ScaffoldingLlm: The interface class to run inference time compute methods.

Workflow of Scaffolding:

  1. Users instantiate a ScaffoldingLlm instance by assembling a Controller and some Workers.
  2. Users call ScaffoldingLlm's API to run inference time compute methods.
  3. ScaffoldingLlm instantiate a Controller instance and get Task from Controller.
  4. ScaffoldingLlm dispatch the Task to Worker and return the completed Task back to Controller.
  5. Controller create new Task until the inference time compute method is finished.
  6. ScaffoldingLlm return the result to users.

Usage

See example/scaffolding.

Future Work

  • support openai api workeron the way
  • support reward model on the way
  • performance benchmark on the way
  • support best of N
  • support MCTS
  • support sandbox