TensorRT-LLMs/docs/source/architecture/add-model.md
Guoming Zhang 7f3f658d5f [None][doc] Rename TensorRT-LLM to TensorRT LLM. (#7554)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00

96 lines
3.1 KiB
Markdown

(add-model)=
# Adding a Model
This document describes how to add a typical decoder-only model in TensorRT LLM.
## Step 1. Write Modeling Part
TensorRT LLM provides different levels of APIs:
- Low-level functions, for example, `concat`, `add`, and `sum`.
- Basic layers, such as, `Linear` and `LayerNorm`.
- High-level layers, such as, `MLP` and `Attention`.
- Base class for typical decoder-only models, such as, `DecoderModelForCausalLM`.
1. Create a model directory in `tensorrt_llm/models`, for example `my_model`.
2. Write a `model.py` with TensorRT LLM's APIs
```python
class MyDecoderLayer(Module):
def __init__(self, config: PretrainedConfig, layer_idx: int):
self.layer_idx = layer_idx
self.config = config
self.input_layernorm = LayerNorm(...)
self.attention = Attention(...)
self.post_layernorm = LayerNorm(...)
self.mlp = MLP(...)
def forward(self, hidden_states, ...):
# decoder layer forward
return hidden_states
class MyModel(Module):
def __init__(self, config: PretrainedConfig):
self.config = config
self.vocab_embedding = Embedding(...)
self.layers = DecoderLayerList(MyDecoderLayer, config)
self.ln_f = LayerNorm(...)
def forward(self, input_ids, ...):
# model forward
return hidden_states
class MyModelForCausalLM(DecoderModelForCausalLM):
def __init__(self, config: PretrainedConfig):
transformer = MyModel(config)
lm_head = ColumnLinear(...)
super().__init__(config, transformer, lm_head)
```
## Step 2. Implement Weight Conversion
The weights from source framework need to be converted and bound to the new added TensorRT LLM model. Here is an example of converting HuggingFace weights:
```python
class MyModelForCausalLM(DecoderModelForCausalLM):
@classmethod
def from_hugging_face(
cls,
hf_model_dir,
dtype='float16',
mapping: Optional[Mapping] = None) -> MyModelForCausalLM
# create a TensorRT LLM MyModelForCausalLM model object
# convert HuggingFace checkpoint to TensorRT LLM expected weights dict
# load the weights to MyModelForCausalLM object
```
It's optional to develop a `convert_checkpoint.py` script in the `examples/my_model/` directory for the convenience of offline weights conversion.
## Step 3. Register New Model
Please register the new model class `MyModelForCausalLM` in `tensorrt_llm/models/__init__.py`.
## Step 4. Verify New Model
At last, let's verify the new model. The typical commands are as following:
```bash
cd examples/my_model/
python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir
trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir
# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
```
## Reference
It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.