mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
3.1 KiB
3.1 KiB
(add-model)=
Adding a Model
This document describes how to add a typical decoder-only model in TensorRT LLM.
Step 1. Write Modeling Part
TensorRT LLM provides different levels of APIs:
- Low-level functions, for example,
concat,add, andsum. - Basic layers, such as,
LinearandLayerNorm. - High-level layers, such as,
MLPandAttention. - Base class for typical decoder-only models, such as,
DecoderModelForCausalLM.
- Create a model directory in
tensorrt_llm/models, for examplemy_model. - Write a
model.pywith TensorRT LLM's APIs
class MyDecoderLayer(Module):
def __init__(self, config: PretrainedConfig, layer_idx: int):
self.layer_idx = layer_idx
self.config = config
self.input_layernorm = LayerNorm(...)
self.attention = Attention(...)
self.post_layernorm = LayerNorm(...)
self.mlp = MLP(...)
def forward(self, hidden_states, ...):
# decoder layer forward
return hidden_states
class MyModel(Module):
def __init__(self, config: PretrainedConfig):
self.config = config
self.vocab_embedding = Embedding(...)
self.layers = DecoderLayerList(MyDecoderLayer, config)
self.ln_f = LayerNorm(...)
def forward(self, input_ids, ...):
# model forward
return hidden_states
class MyModelForCausalLM(DecoderModelForCausalLM):
def __init__(self, config: PretrainedConfig):
transformer = MyModel(config)
lm_head = ColumnLinear(...)
super().__init__(config, transformer, lm_head)
Step 2. Implement Weight Conversion
The weights from source framework need to be converted and bound to the new added TensorRT LLM model. Here is an example of converting HuggingFace weights:
class MyModelForCausalLM(DecoderModelForCausalLM):
@classmethod
def from_hugging_face(
cls,
hf_model_dir,
dtype='float16',
mapping: Optional[Mapping] = None) -> MyModelForCausalLM
# create a TensorRT LLM MyModelForCausalLM model object
# convert HuggingFace checkpoint to TensorRT LLM expected weights dict
# load the weights to MyModelForCausalLM object
It's optional to develop a convert_checkpoint.py script in the examples/my_model/ directory for the convenience of offline weights conversion.
Step 3. Register New Model
Please register the new model class MyModelForCausalLM in tensorrt_llm/models/__init__.py.
Step 4. Verify New Model
At last, let's verify the new model. The typical commands are as following:
cd examples/my_model/
python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir
trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir
# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
Reference
It's recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.