doc: fix invalid link in llama 4 example documentation (#6340)

Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
This commit is contained in:
Liana Koleva 2025-07-26 08:27:10 -07:00 committed by GitHub
parent 54f68287fc
commit 96d004d800
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -134,7 +134,7 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
- `max_batch_size` and `max_num_tokens` can easily affect the performance. The default values for them are already carefully designed and should deliver good performance on overall cases, however, you may still need to tune it for peak performance.
- `max_batch_size` should not be too low to bottleneck the throughput. Note with Attention DP, the the whole system's max_batch_size will be `max_batch_size*dp_size`.
- CUDA grah `max_batch_size` should be same value as TensorRT-LLM server's `max_batch_size`.
- For more details on `max_batch_size` and `max_num_tokens`, refer to [Tuning Max Batch Size and Max Num Tokens](../performance/performance-tuning-guide/tuning-max-batch-size-and-max-num-tokens.md).
- For more details on `max_batch_size` and `max_num_tokens`, refer to [Tuning Max Batch Size and Max Num Tokens](../../../../docs/source/performance/performance-tuning-guide/tuning-max-batch-size-and-max-num-tokens.md).
### Troubleshooting