mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* Update TensorRT-LLM --------- Co-authored-by: Timur Abishev <abishev.timur@gmail.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai> Co-authored-by: hattizai <hattizai@gmail.com>
96 lines
3.1 KiB
Markdown
96 lines
3.1 KiB
Markdown
(add-model)=
|
|
|
|
# Adding a Model
|
|
|
|
This document describes how to add a typical decoder-only model in TensorRT-LLM.
|
|
|
|
## Step 1. Write Modeling Part
|
|
|
|
TensorRT-LLM provides different levels of APIs:
|
|
|
|
- Low-level functions, for example, `concat`, `add`, and `sum`.
|
|
- Basic layers, such as, `Linear` and `LayerNorm`.
|
|
- High-level layers, such as, `MLP` and `Attention`.
|
|
- Base class for typical decoder-only models, such as, `DecoderModelForCausalLM`.
|
|
|
|
1. Create a model directory in `tensorrt_llm/models`, for example `my_model`.
|
|
2. Write a `model.py` with TensorRT-LLM's APIs
|
|
|
|
```python
|
|
class MyDecoderLayer(Module):
|
|
def __init__(self, config: PretrainedConfig, layer_idx: int):
|
|
self.layer_idx = layer_idx
|
|
self.config = config
|
|
self.input_layernorm = LayerNorm(...)
|
|
self.attention = Attention(...)
|
|
self.post_layernorm = LayerNorm(...)
|
|
self.mlp = MLP(...)
|
|
|
|
def forward(self, hidden_states, ...):
|
|
# decoder layer forward
|
|
return hidden_states
|
|
|
|
class MyModel(Module):
|
|
def __init__(self, config: PretrainedConfig):
|
|
self.config = config
|
|
self.vocab_embedding = Embedding(...)
|
|
self.layers = DecoderLayerList(MyDecoderLayer, config)
|
|
self.ln_f = LayerNorm(...)
|
|
|
|
def forward(self, input_ids, ...):
|
|
# model forward
|
|
return hidden_states
|
|
|
|
|
|
class MyModelForCausalLM(DecoderModelForCausalLM):
|
|
def __init__(self, config: PretrainedConfig):
|
|
transformer = MyModel(config)
|
|
lm_head = ColumnLinear(...)
|
|
super().__init__(config, transformer, lm_head)
|
|
```
|
|
|
|
|
|
## Step 2. Implement Weight Conversion
|
|
|
|
The weights from source framework need to be converted and bound to the new added TensorRT-LLM model. Here is an example of converting HuggingFace weights:
|
|
|
|
```python
|
|
class MyModelForCausalLM(DecoderModelForCausalLM):
|
|
@classmethod
|
|
def from_hugging_face(
|
|
cls,
|
|
hf_model_dir,
|
|
dtype='float16',
|
|
mapping: Optional[Mapping] = None) -> MyModelForCausalLM
|
|
# create a TensorRT-LLM MyModelForCausalLM model object
|
|
# convert HuggingFace checkpoint to TensorRT-LLM expected weights dict
|
|
# load the weights to MyModelForCausalLM object
|
|
```
|
|
|
|
It's optional to develop a `convert_checkpoint.py` script in the `examples/my_model/` directory for the convenience of offline weights conversion.
|
|
|
|
## Step 3. Register New Model
|
|
|
|
Please register the new model class `MyModelForCausalLM` in `tensorrt_llm/models/__init__.py`.
|
|
|
|
## Step 4. Verify New Model
|
|
|
|
At last, let's verify the new model. The typical commands are as following:
|
|
|
|
```bash
|
|
cd examples/my_model/
|
|
|
|
python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir
|
|
|
|
trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir
|
|
|
|
# try the model with a single prompt
|
|
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
|
|
# run summarization task
|
|
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
|
|
```
|
|
|
|
## Reference
|
|
|
|
It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.
|