chore: fix some invalid paths of contrib models (#3818)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2026-01-13 22:18:36 +08:00 · 2025-04-23 14:36:16 -07:00 · 2025-04-23 14:36:16 -07:00 · d0d19e81ca
commit d0d19e81ca
parent dfbcb543ce
6 changed files with 23 additions and 24 deletions
--- a/cpp/tests/README.md
+++ b/cpp/tests/README.md
@ -60,9 +60,7 @@ To build the engines from the top-level directory:

 ```bash
 PYTHONPATH=examples/models/core/gpt:$PYTHONPATH python3 cpp/tests/resources/scripts/build_gpt_engines.py
-PYTHONPATH=examples/models/contrib/gpt:$PYTHONPATH python3 cpp/tests/resources/scripts/build_gptj_engines.py
 PYTHONPATH=examples/models/core/llama:$PYTHONPATH python3 cpp/tests/resources/scripts/build_llama_engines.py
-PYTHONPATH=examples/chatglm:$PYTHONPATH python3 cpp/tests/resources/scripts/build_chatglm_engines.py
 PYTHONPATH=examples/medusa:$PYTHONPATH python3 cpp/tests/resources/scripts/build_medusa_engines.py
 PYTHONPATH=examples/eagle:$PYTHONPATH python3 cpp/tests/resources/scripts/build_eagle_engines.py
 PYTHONPATH=examples/redrafter:$PYTHONPATH python3 cpp/tests/resources/scripts/build_redrafter_engines.py
@ -86,9 +84,7 @@ End-to-end tests read inputs and expected outputs from Numpy files located at [c

 ```bash
 PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_gpt_output.py
-PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_gptj_output.py
 PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_llama_output.py
-PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_chatglm_output.py
 PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_medusa_output.py
 PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_eagle_output.py
 PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_redrafter_output.py
--- a/docs/source/blogs/Falcon180B-H200.md
+++ b/docs/source/blogs/Falcon180B-H200.md
@ -57,7 +57,7 @@ step further by performing FP8 computation on Hopper GPUs instead of the
 standard FP16.

 Similar examples running Falcon-180B with quantization in TensorRT-LLM are
-available in [examples/falcon](/examples/falcon).
+available in [examples/models/contrib/falcon](/examples/models/contrib/falcon).

 ## Llama-70B on H200 up to 6.7x A100

--- a/docs/source/reference/support-matrix.md
+++ b/docs/source/reference/support-matrix.md
@ -8,27 +8,30 @@ TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA

 ### LLM Models

- [Arctic](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/arctic)
- [Baichuan/Baichuan2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/baichuan)
+- [Arctic](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/arctic)
+- [Baichuan/Baichuan2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/baichuan)
 - [BART](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec)
 - [BERT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/bert)
- [BLOOM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/bloom)
+- [BLOOM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/bloom)
 - [ByT5](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec)
- [GLM/ChatGLM/ChatGLM2/ChatGLM3/GLM-4](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/chatglm)
+- [ChatGLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/chatglm-6b)
+- [ChatGLM2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/chatglm2-6b)
+- [ChatGLM3](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/chatglm3-6b-32k)
 - [Code LLaMA](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/llama)
- [DBRX](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/dbrx)
+- [DBRX](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/dbrx)
 - [Exaone](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/exaone)
 - [FairSeq NMT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec)
- [Falcon](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/falcon)
+- [Falcon](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/falcon)
 - [Flan-T5](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec) [^encdec]
 - [Gemma/Gemma2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gemma)
+- [GLM-4](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/glm-4-9b)
 - [GPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gpt)
- [GPT-J](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/gpt)
+- [GPT-J](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/gptj)
 - [GPT-Nemo](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gpt)
- [GPT-NeoX](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gptneox)
+- [GPT-NeoX](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/gptneox)
 - [Granite-3.0](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/granite)
- [Grok-1](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/grok)
- [InternLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/internlm)
+- [Grok-1](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/grok)
+- [InternLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples//models/contrib/internlm)
 - [InternLM2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/internlm2)
 - [LLaMA/LLaMA 2/LLaMA 3/LLaMA 3.1](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/llama)
 - [Mamba](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/mamba)
@ -37,19 +40,19 @@ TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA
 - [Mistral](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/llama)
 - [Mistral NeMo](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/llama)
 - [Mixtral](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/mixtral)
- [MPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mpt)
+- [MPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/mpt)
 - [Nemotron](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/nemotron)
 - [mT5](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec)
- [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/opt)
+- [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/opt)
 - [Phi-1.5/Phi-2/Phi-3](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/phi)
 - [Qwen/Qwen1.5/Qwen2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/qwen)
 - [Qwen-VL](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/qwenvl)
 - [RecurrentGemma](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/recurrentgemma)
- [Replit Code](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mpt) [^replitcode]
+- [Replit Code](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/mpt) [^replitcode]
 - [RoBERTa](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/bert)
 - [SantaCoder](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gpt)
- [Skywork](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/skywork)
- [Smaug](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/smaug)
+- [Skywork](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/skywork)
+- [Smaug](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/smaug)
 - [StarCoder](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gpt)
 - [T5](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec)
 - [Whisper](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/whisper)
--- a/examples/models/contrib/grok/README.md
+++ b/examples/models/contrib/grok/README.md
@ -22,7 +22,7 @@ The grok-1 model requires a node with 8x80GB GPU memory(at least).

 ## Overview

-The TensorRT-LLM Grok-1 implementation can be found in [tensorrt_llm/models/grok/model.py](../../../../tensorrt_llm/models/grok/model.py). The TensorRT-LLM Grok-1 example code is located in [`examples/grok`](./). There is one main file:
+The TensorRT-LLM Grok-1 implementation can be found in [tensorrt_llm/models/grok/model.py](../../../../tensorrt_llm/models/grok/model.py). The TensorRT-LLM Grok-1 example code is located in [`examples/models/contrib/grok`](./). There is one main file:

 * [`convert_checkpoint.py`](./convert_checkpoint.py) to convert the Grok-1 model into tensorrt-llm checkpoint format.

@ -38,7 +38,7 @@ In addition, there are two shared files in the parent folder [`examples`](../../

 ## Usage

-The TensorRT-LLM Grok-1 example code locates at [examples/grok](./). It takes xai weights as input, and builds the corresponding TensorRT engines. The number of TensorRT engines depends on the number of GPUs used to run inference.
+The TensorRT-LLM Grok-1 example code locates at [examples/models/contrib/grok](./). It takes xai weights as input, and builds the corresponding TensorRT engines. The number of TensorRT engines depends on the number of GPUs used to run inference.

 ### Build TensorRT engine(s)

--- a/examples/models/contrib/opt/README.md
+++ b/examples/models/contrib/opt/README.md
@ -18,7 +18,7 @@ multiple GPUs or multiple nodes with multiple GPUs.

 ## Overview

-The TensorRT-LLM OPT implementation can be found in [`tensorrt_llm/models/opt/model.py`](../../tensorrt_llm/models/opt/model.py). The TensorRT-LLM OPT example code is located in [`examples/opt`](./). There is one file:
+The TensorRT-LLM OPT implementation can be found in [`tensorrt_llm/models/opt/model.py`](../../tensorrt_llm/models/opt/model.py). The TensorRT-LLM OPT example code is located in [`examples/models/contrib/opt`](./). There is one file:

 * [`convert_checkpoint.py`](./convert_checkpoint.py) to convert a checkpoint from the [HuggingFace (HF) Transformers](https://github.com/huggingface/transformers) format to the TensorRT-LLM format

--- a/examples/models/core/multimodal/README.md
+++ b/examples/models/core/multimodal/README.md
@ -359,7 +359,7 @@ Firstly, please install transformers with 4.45.2
    pip install -r requirements-internlm-xcomposer2.txt
 ```

-1. Convert Huggingface weights to TRT-LLM checkpoint format using `examples/internlm/README.md`.
+1. Convert Huggingface weights to TRT-LLM checkpoint format using `examples/models/contrib/internlm/README.md`.

 2. Use `trtllm-build` command to build TRT-LLM engine for OPT.