TensorRT-LLMs/docs/source/architecture/add-model.md

(add-model)=

# Adding a Model

This document describes how to add a typical decoder-only model in TensorRT-LLM.

## Step 1. Write Modeling Part

TensorRT-LLM provides different levels of APIs:

- Low-level functions, for example, `concat`, `add`, and `sum`.
- Basic layers, such as, `Linear` and `LayerNorm`.
- High-level layers, such as, `MLP` and `Attention`.
- Base class for typical decoder-only models, such as, `DecoderModelForCausalLM`.

1. Create a model directory in `tensorrt_llm/models`, for example `my_model`.
2. Write a `model.py` with TensorRT-LLM's APIs

```python
class MyDecoderLayer(Module):
    def __init__(self, config: PretrainedConfig, layer_idx: int):
        self.layer_idx = layer_idx
        self.config = config
        self.input_layernorm = LayerNorm(...)
        self.attention = Attention(...)
        self.post_layernorm = LayerNorm(...)
        self.mlp = MLP(...)

    def forward(self, hidden_states, ...):
        # decoder layer forward
        return hidden_states

class MyModel(Module):
    def __init__(self, config: PretrainedConfig):
        self.config = config
        self.vocab_embedding = Embedding(...)
        self.layers = DecoderLayerList(MyDecoderLayer, config)
        self.ln_f = LayerNorm(...)

    def forward(self, input_ids, ...):
        # model forward
        return hidden_states


class MyModelForCausalLM(DecoderModelForCausalLM):
    def __init__(self, config: PretrainedConfig):
        transformer = MyModel(config)
        lm_head = ColumnLinear(...)
        super().__init__(config, transformer, lm_head)
```


## Step 2. Implement Weight Conversion

The weights from source framework need to be converted and bound to the new added TensorRT-LLM model. Here is an example of converting HuggingFace weights:

```python
class MyModelForCausalLM(DecoderModelForCausalLM):
    @classmethod
    def from_hugging_face(
            cls,
            hf_model_dir,
            dtype='float16',
            mapping: Optional[Mapping] = None) -> MyModelForCausalLM
        # create a TensorRT-LLM MyModelForCausalLM model object
        # convert HuggingFace checkpoint to TensorRT-LLM expected weights dict
        # load the weights to MyModelForCausalLM object
```

It's optional to develop a `convert_checkpoint.py` script in the `examples/my_model/` directory for the convenience of offline weights conversion.

## Step 3. Register New Model

Please register the new model class `MyModelForCausalLM` in `tensorrt_llm/models/__init__.py`.

## Step 4. Verify New Model

At last, let's verify the new model. The typical commands are as following:

```bash
cd examples/my_model/

python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir

trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir

# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
```

## Reference

It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.