mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-02 17:21:01 +08:00
* Update TensorRT-LLM --------- Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
11 lines
408 B
Markdown
11 lines
408 B
Markdown
# Key Features
|
|
|
|
This document lists key features supported in TensorRT-LLM.
|
|
|
|
- [Quantization](../source/reference/precision.md)
|
|
- [Inflight Batching](../source/advanced/gpt-attention.md#in-flight-batching)
|
|
- [Chunked Context](../source/advanced/gpt-attention.md#chunked-context)
|
|
- [LoRA](../source/advanced/lora.md)
|
|
- [KV Cache Reuse](./kv_cache_reuse.md)
|
|
- [Speculative Sampling](./speculative_decoding.md)
|