mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Zhenhuan Chen ad15e45f07 [TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>		2025-04-28 17:15:33 +08:00
..
contrib	feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731 )	2025-04-24 18:47:03 +08:00
__init__.py	feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731 )	2025-04-24 18:47:03 +08:00
controller.py	[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807 )	2025-04-28 17:15:33 +08:00
math_utils.py	feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3312 )	2025-04-09 21:13:47 +08:00
README.md	doc: Add README.md for scaffolding (#3048 )	2025-03-25 13:58:01 +08:00
scaffolding_llm.py	feat: Make scaffolding Controller more generic #3408 (#3416 )	2025-04-12 21:35:38 +08:00
task.py	[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807 )	2025-04-28 17:15:33 +08:00
worker.py	[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807 )	2025-04-28 17:15:33 +08:00

Scaffolding

Introduction

Scaffolding is a framework to run various inference time compute methods such as CoT, majority vote, best of N, MCTS, etc.

Scaffolding is built around three key principles:

Ease of Use: Users can easily run inference time compute methods and enginners can easily customize the framework to add new methods and execution backends.
Modularity: Scaffolding is designed to be modular. Engineers can fully reuse existing modules when defining new methods.
Performance: Scaffolding is designed to be performant. It considers the design of concurrent schedule and provides more information to the backend to help optimize performance.

There are following key components in Scaffolding:

Controller: The class that defines the workflow of inference time compute methods.
Worker: The class that handles the operations such as generation and reward scoring.
ScaffoldingLlm: The interface class to run inference time compute methods.

Workflow of Scaffolding:

Users instantiate a ScaffoldingLlm instance by assembling a Controller and some Workers.
Users call ScaffoldingLlm's API to run inference time compute methods.
ScaffoldingLlm instantiate a Controller instance and get Task from Controller.
ScaffoldingLlm dispatch the Task to Worker and return the completed Task back to Controller.
Controller create new Task until the inference time compute method is finished.
ScaffoldingLlm return the result to users.