docs:update 0.19 docs (#3986)

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
This commit is contained in:
nv-guomingz 2025-04-30 19:25:26 +08:00 committed by GitHub
parent cd9c7498d0
commit 034f6f2d91
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 2 additions and 2 deletions

View File

@ -778,7 +778,7 @@ Refer to the {ref}`support-matrix-software` section for a list of supported mode
- System prompt caching
- Enabled split-k for weight-only cutlass kernels
- FP8 KV cache support for XQA kernel
- New Python builder API and `trtllm-build` command (already applied to [blip2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/blip2) and [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/contrib/opt#3-build-tensorrt-engines))
- New Python builder API and `trtllm-build` command (already applied to [blip2](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/blip2) and [OPT](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/opt#3-build-tensorrt-engines))
- Support `StoppingCriteria` and `LogitsProcessor` in Python generate API
- FHMA support for chunked attention and paged KV cache
- Performance enhancements include:

View File

@ -250,7 +250,7 @@ llm = LLM(
</details>
For more examples on TRT-LLM LLM API, visit [`this page`](https://nvidia.github.io/TensorRT-LLM/llm-api-examples/).
For more examples on TRT-LLM LLM API, visit [`this page`](https://nvidia.github.io/TensorRT-LLM/examples/llm_api_examples.html).
______________________________________________________________________