Minor fixes for documents (#3577)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
This commit is contained in:
Kaiyu Xie 2025-04-16 07:47:18 +08:00 committed by GitHub
parent fffb403125
commit f5f68ded26
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 1 additions and 2 deletions

View File

@ -1443,7 +1443,6 @@ trtllm-build --checkpoint_dir llama_3.1_405B_HF_FP8_model/trt_ckpts/tp8-pp1/ \
To run inference on the 405B model, we often need to use multi-node to accommodate the entire model. Here, we use slurm to launch the job on multiple nodes.
Notes:
* For the FP8 model, we can fit it on a single 8xH100 node, but we cannot support 128k context due to memory limitations. So, we test with 64k context in this demonstration.
* For convenience, we use the Huggingface tokenizer for tokenization.
The following script shows how to run evaluation on long context:

View File

@ -154,7 +154,7 @@ def parse_arguments():
default=False,
action="store_true",
help=
'By default, we use dtype for KV cache. fp8_kv_cache chooses int8 quantization for KV'
'By default, we use dtype for KV cache. fp8_kv_cache chooses fp8 quantization for KV'
)
parser.add_argument(
'--quant_ckpt_path',