diff --git a/docs/source/release-notes.md b/docs/source/release-notes.md index f5653b5d49..d86ef18afa 100644 --- a/docs/source/release-notes.md +++ b/docs/source/release-notes.md @@ -778,7 +778,7 @@ Refer to the {ref}`support-matrix-software` section for a list of supported mode - System prompt caching - Enabled split-k for weight-only cutlass kernels - FP8 KV cache support for XQA kernel -- New Python builder API and `trtllm-build` command (already applied to [blip2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/blip2) and [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/opt#3-build-tensorrt-engines)) +- New Python builder API and `trtllm-build` command (already applied to [blip2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/blip2) and [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/opt#3-build-tensorrt-engines)) - Support `StoppingCriteria` and `LogitsProcessor` in Python generate API - FHMA support for chunked attention and paged KV cache - Performance enhancements include: diff --git a/examples/auto_deploy/README.md b/examples/auto_deploy/README.md index 0fdb5b483b..2ac6d7fa78 100644 --- a/examples/auto_deploy/README.md +++ b/examples/auto_deploy/README.md @@ -250,7 +250,7 @@ llm = LLM( -For more examples on TRT-LLM LLM API, visit [`this page`](https://nvidia.github.io/TensorRT-LLM/llm-api-examples/). +For more examples on TRT-LLM LLM API, visit [`this page`](https://nvidia.github.io/TensorRT-LLM/examples/llm_api_examples.html). ______________________________________________________________________