TensorRT-LLMs/tensorrt_llm/scaffolding/contrib/Dynasor
WeiHaocheng cc286687c4
[None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-10-30 16:02:40 +08:00
..
__init__.py feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731) 2025-04-24 18:47:03 +08:00
dynasor_controller.py [None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622) 2025-10-30 16:02:40 +08:00
evaluator.py feat(part 2): Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3731) 2025-04-24 18:47:03 +08:00
README.md [None][docs] Update Dynasor paper info (#7137) 2025-08-29 18:47:47 -07:00

Dynasor

This document shows how to speed up reasoning models without training or finetuning by using Dynasor (Efficiently Scaling LLM Reasoning with Certaindex) in TensorRTLLM.

Overview

Reasoning models often exhibit poor token efficiency, wasting tokens by secondguessing themselves. Dynasor is a certaintybased approach that dynamically allocates inference compute for reasoning models and stops inference as soon as the LLM has enough information to make a decision.

Currently, this folder provides only DynasorCoT, which applies ChainofThought (CoT) reasoning. It optimizes models such as DeepseekR1 and its distilled variants. Support for additional reasoning algorithms (SelfConsistency, Monte Carlo Tree Search, and Rebase) will be added later.

Usage

The core logic for DynasorCoT lives in the DynasorGenerationController class in dynasor_controller.py. It extends the base Controller and implements certaintybased stopping.

You can adjust the computesaving level by initializing DynasorGenerationController with different values for:

  • certainty_threshold: Number of consecutive identical and confident probe answers required to consider the generation as certain.
  • chunk_size: Number of tokens to generate per proposal round.

Lowering either value saves more tokens but may risk accuracy.

Quick Start

  1. Basic usage DynasorGenerationController is a computesaving alternative to NativeGenerationController. To try it, run:

    python examples/scaffolding/contrib/Dynasor/scaffolding_dynasor_run.py
    
  2. Add aggregation method You can wrap DynasorGenerationController with other controllers—for example, MajorityVoteController to perform majority voting:

    python examples/scaffolding/contrib/Dynasor/scaffolding_dynasor_run.py --majority_vote
    

References

If you use Dynasor for your research, please cite our paper:

@article{fu2024efficiently,
  title={Efficiently Scaling LLM Reasoning with Certaindex},
  author={Fu, Yichao and Chen, Junda and Zhu, Siqi and Fu, Zheyu and Dai, Zhongdongming and Zhuang, Yonghao and Ma, Yian and Qiao, Aurick and Rosing, Tajana and Stoica, Ion and Zhang, Hao},
  journal={arXiv preprint arXiv:2412.20993},
  year={2024}
}

Acknowledgments

Dynasor in TensorRTLLM is built upon the tensorrt_llm/scaffolding framework, which supports a variety of inferencetime compute methods—such as chainofthought, majority voting, bestofN sampling, MCTS, and more. Were grateful to the original scaffolding contributors for their excellent work.

If youre researching in this area and interested in extending it, youre warmly invited to contribute your own inferencetime compute methods to scaffolding.