mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Signed-off-by: leslie-fang25 <leslief@nvidia.com>

2025-08-19 07:42:52 +08:00

Feature Combination Matrix

Feature	Overlap Scheduler	CUDA Graph	Attention Data Parallelism	Disaggregated Serving	Chunked Prefill	MTP	EAGLE-3(One Model Engine)	EAGLE-3(Two Model Engine)	Torch Sampler	TLLM C++ Sampler	KV Cache Reuse	Slide Window Attention	Logits Post Processor	Guided Decoding
Overlap Scheduler	---
CUDA Graph	Yes	---
Attention Data Parallelism	Yes	Yes	---
Disaggregated Serving	Yes	Yes	Yes	---
Chunked Prefill	Yes	Yes	Yes	Untested	---
MTP	Yes	Yes	Yes	Yes	Yes	---
EAGLE-3(One Model Engine)	Yes	Yes	Yes	Yes	Yes	No	---
EAGLE-3(Two Model Engine)	NO	Yes	Yes	Yes	Yes	No	No	---
Torch Sampler	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
TLLM C++ Sampler	Yes	Yes	Yes	Yes	Yes	No	No	No	No	---
KV Cache Reuse	Yes	Yes	Yes	Untested	Yes	Untested	Yes	No	Yes	Yes	---
Slide Window Attention	Yes	Yes	Yes	Untested	No	Untested	Untested	Untested	Yes	Yes	WIP	---
Logits Post Processor	No	Yes	Yes	No	Yes	No	No	No	Yes	Yes	Yes	Yes	---
Guided Decoding	Yes	Yes	Yes	Yes	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	---