mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Daniel Cámpora efca359b66

[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 )

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

2025-08-07 22:19:37 -04:00

762 B

Raw Blame History

Sampling

The PyTorch backend supports most of the sampling features that are supported on the C++ backend, such as temperature, top-k and top-p sampling, stop words, bad words, penalty, context and generation logits, and log probs.

The following example prepares two identical prompts which will give different results due to the sampling parameters chosen:

from tensorrt_llm import LLM
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8')
sampling_params = SamplingParams(
        temperature=1.0,
        top_k=8,
        top_p=0.5,
    )
llm.generate(["Hello, my name is",
            "Hello, my name is"], sampling_params)

When using speculative decoders such as MTP or Eagle-3 the subset of sampling options available is more restricted.

762 B Raw Blame History

Sampling

762 B

Raw Blame History