TensorRT-LLMs/examples/high-level-api/run_quant_examples.py
Kaiyu Xie 250d9c293d
Update TensorRT-LLM Release branch (#1445)
* Update TensorRT-LLM

---------

Co-authored-by: Bhuvanesh Sridharan <bhuvan.sridharan@gmail.com>
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Eddie-Wang1120 <wangjinheng1120@163.com>
Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com>
2024-04-12 17:59:19 +08:00

24 lines
654 B
Python

#!/usr/bin/env python
import os
import subprocess
import sys
PROMPT = "Tell a story"
LLAMA_MODEL_DIR = sys.argv[1]
EXAMPLES_ROOT = sys.argv[2] if len(sys.argv) > 2 else ""
LLM_EXAMPLES = os.path.join(EXAMPLES_ROOT, 'llm_examples.py')
run_cmd = [
sys.executable, LLM_EXAMPLES, "--task=run_llm_with_quantization",
f"--prompt={PROMPT}", f"--hf_model_dir={LLAMA_MODEL_DIR}",
"--quant_type=int4_awq"
]
subprocess.run(run_cmd, check=True)
run_cmd = [
sys.executable, LLM_EXAMPLES, "--task=run_llm_with_quantization",
f"--prompt={PROMPT}", f"--hf_model_dir={LLAMA_MODEL_DIR}",
"--quant_type=fp8"
]
subprocess.run(run_cmd, check=True)