TensorRT-LLMs/docs/source/architecture/add-model.md
Kaiyu Xie bca9a33b02
Update TensorRT-LLM (#2008)
* Update TensorRT-LLM

---------

Co-authored-by: Timur Abishev <abishev.timur@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai>
Co-authored-by: hattizai <hattizai@gmail.com>
2024-07-23 23:05:09 +08:00

96 lines
3.1 KiB
Markdown

(add-model)=
# Adding a Model
This document describes how to add a typical decoder-only model in TensorRT-LLM.
## Step 1. Write Modeling Part
TensorRT-LLM provides different levels of APIs:
- Low-level functions, for example, `concat`, `add`, and `sum`.
- Basic layers, such as, `Linear` and `LayerNorm`.
- High-level layers, such as, `MLP` and `Attention`.
- Base class for typical decoder-only models, such as, `DecoderModelForCausalLM`.
1. Create a model directory in `tensorrt_llm/models`, for example `my_model`.
2. Write a `model.py` with TensorRT-LLM's APIs
```python
class MyDecoderLayer(Module):
def __init__(self, config: PretrainedConfig, layer_idx: int):
self.layer_idx = layer_idx
self.config = config
self.input_layernorm = LayerNorm(...)
self.attention = Attention(...)
self.post_layernorm = LayerNorm(...)
self.mlp = MLP(...)
def forward(self, hidden_states, ...):
# decoder layer forward
return hidden_states
class MyModel(Module):
def __init__(self, config: PretrainedConfig):
self.config = config
self.vocab_embedding = Embedding(...)
self.layers = DecoderLayerList(MyDecoderLayer, config)
self.ln_f = LayerNorm(...)
def forward(self, input_ids, ...):
# model forward
return hidden_states
class MyModelForCausalLM(DecoderModelForCausalLM):
def __init__(self, config: PretrainedConfig):
transformer = MyModel(config)
lm_head = ColumnLinear(...)
super().__init__(config, transformer, lm_head)
```
## Step 2. Implement Weight Conversion
The weights from source framework need to be converted and bound to the new added TensorRT-LLM model. Here is an example of converting HuggingFace weights:
```python
class MyModelForCausalLM(DecoderModelForCausalLM):
@classmethod
def from_hugging_face(
cls,
hf_model_dir,
dtype='float16',
mapping: Optional[Mapping] = None) -> MyModelForCausalLM
# create a TensorRT-LLM MyModelForCausalLM model object
# convert HuggingFace checkpoint to TensorRT-LLM expected weights dict
# load the weights to MyModelForCausalLM object
```
It's optional to develop a `convert_checkpoint.py` script in the `examples/my_model/` directory for the convenience of offline weights conversion.
## Step 3. Register New Model
Please register the new model class `MyModelForCausalLM` in `tensorrt_llm/models/__init__.py`.
## Step 4. Verify New Model
At last, let's verify the new model. The typical commands are as following:
```bash
cd examples/my_model/
python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir
trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir
# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
```
## Reference
It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.