TensorRT-LLMs/examples/models/contrib/smaug
bhsueh_NV 322ac565fc
chore: clean some ci of qa test (#3083)
* move some models to examples/models/contrib

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* update the document

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove arctic, blip2, cogvlm, dbrx from qa test list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove tests of dit, mmdit and stdit from qa test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove grok, jais, sdxl, skywork, smaug from qa test list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* re-organize the glm examples

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix issues after running pre-commit

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix some typo in glm_4_9b readme

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
..
README.md chore: clean some ci of qa test (#3083) 2025-03-31 14:30:41 +08:00
requirements.txt chore: clean some ci of qa test (#3083) 2025-03-31 14:30:41 +08:00

Smaug

This document elaborates how to build the Smaug-72B-v0.1 model to runnable engines on multi-GPU node and perform a summarization task using these engines.

Overview

The TensorRT-LLM support for Smaug-72B-v0.1 is based on the LLaMA model, the implementation can be found in tensorrt_llm/models/llama/model.py. Smaug model resembles LLaMA very much except it uses bias term in its attention module, we therefore reuse the LLaMA example code for Smaug,

In addition, there are two shared files in the parent folder examples for inference and evaluation:

Support Matrix

  • FP16

Usage

This section gives a whole process where we convert HF models, build TensorRT-LLM engines and ultimately perform summarization.

Build TensorRT engine(s)

Run the following commands and TRT-LLM will first transforms a HF model into its own checkpoint format, then builds a TRT engine based on the checkpoint

python ../../../llama/convert_checkpoint.py \
    --model_dir ./Smaug-72B-v0.1 \
    --output_dir ./tllm_checkpoint_8gpu_tp8 \
    --dtype float16 \
    --tp_size 8

trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8 \
    --output_dir ./Smaug_72B_tp8 \
    --gemm_plugin float16 \
    --gpt_attention_plugin float16 \
    --context_fmha=enable \
    --max_batch_size 64 \
    --remove_input_padding=enable

Run Summarization

After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on cnn_dailymail dataset and get ROUGE scores. The ROUGE-1 score can be used to validate model implementations.

mpirun -n 8 -allow-run-as-root python ../../../summarize.py \
    --hf_model_dir ../Smaug-72B-v0.1 \
    --engine_dir ./Smaug_72B_tp8 \
    --data_type fp16 \
    --test_hf \
    --hf_device_map_auto \
    --test_trt_llm