mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

[None][doc] Clean the doc folder and move the outdated docs into lega… (#7729 )

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

2025-09-16 11:43:19 +08:00

3.1 KiB

Raw Blame History

(add-model)=

Adding a Model

This document describes how to add a typical decoder-only model in TensorRT LLM.

Step 1. Write Modeling Part

TensorRT LLM provides different levels of APIs:

Low-level functions, for example, concat, add, and sum.
Basic layers, such as, Linear and LayerNorm.
High-level layers, such as, MLP and Attention.
Base class for typical decoder-only models, such as, DecoderModelForCausalLM.

Create a model directory in tensorrt_llm/models, for example my_model.
Write a model.py with TensorRT LLM's APIs

class MyDecoderLayer(Module):
    def __init__(self, config: PretrainedConfig, layer_idx: int):
        self.layer_idx = layer_idx
        self.config = config
        self.input_layernorm = LayerNorm(...)
        self.attention = Attention(...)
        self.post_layernorm = LayerNorm(...)
        self.mlp = MLP(...)

    def forward(self, hidden_states, ...):
        # decoder layer forward
        return hidden_states

class MyModel(Module):
    def __init__(self, config: PretrainedConfig):
        self.config = config
        self.vocab_embedding = Embedding(...)
        self.layers = DecoderLayerList(MyDecoderLayer, config)
        self.ln_f = LayerNorm(...)

    def forward(self, input_ids, ...):
        # model forward
        return hidden_states


class MyModelForCausalLM(DecoderModelForCausalLM):
    def __init__(self, config: PretrainedConfig):
        transformer = MyModel(config)
        lm_head = ColumnLinear(...)
        super().__init__(config, transformer, lm_head)

Step 2. Implement Weight Conversion

The weights from source framework need to be converted and bound to the new added TensorRT LLM model. Here is an example of converting HuggingFace weights:

class MyModelForCausalLM(DecoderModelForCausalLM):
    @classmethod
    def from_hugging_face(
            cls,
            hf_model_dir,
            dtype='float16',
            mapping: Optional[Mapping] = None) -> MyModelForCausalLM
        # create a TensorRT LLM MyModelForCausalLM model object
        # convert HuggingFace checkpoint to TensorRT LLM expected weights dict
        # load the weights to MyModelForCausalLM object

It's optional to develop a convert_checkpoint.py script in the examples/my_model/ directory for the convenience of offline weights conversion.

Step 3. Register New Model

Please register the new model class MyModelForCausalLM in tensorrt_llm/models/__init__.py.

Step 4. Verify New Model

At last, let's verify the new model. The typical commands are as following:

cd examples/my_model/

python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir

trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir

# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm

Reference

It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.

3.1 KiB Raw Blame History