mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Yueh-Ting (eop) Chen 85088dce05

Signed-off-by: eopXD <yuehtingc@nvidia.com>

2025-10-21 04:41:44 -04:00

Feature Combination Matrix

Feature	Overlap Scheduler	CUDA Graph	Attention Data Parallelism	Disaggregated Serving	Chunked Prefill	MTP	EAGLE-3(One Model Engine)	EAGLE-3(Two Model Engine)	Torch Sampler	TLLM C++ Sampler	KV Cache Reuse	Slide Window Attention	Logits Post Processor	Guided Decoding	LoRA
Overlap Scheduler	---
CUDA Graph	Yes	---
Attention Data Parallelism	Yes	Yes	---
Disaggregated Serving	Yes	Yes	Yes	---
Chunked Prefill	Yes	Yes	Yes	Yes	---
MTP	Yes	Yes	Yes	Yes	Yes	---
EAGLE-3(One Model Engine)	Yes	Yes	Yes	Yes	Yes	No	---
EAGLE-3(Two Model Engine)	Yes	Yes	Yes	Yes	Yes	No	No	---
Torch Sampler	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
TLLM C++ Sampler	Yes	Yes	Yes	Yes	Yes	No	No	No	No	---
KV Cache Reuse	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
Slide Window Attention	Yes	Yes	Yes	Yes	Yes	No	Untested	Untested	Yes	Yes	Yes	---
Logits Post Processor	Yes	Yes	Yes	No	Yes	No	No	No	Yes	Yes	Yes	Yes	---
Guided Decoding	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
LoRA	Yes	No	Untested	Untested	Untested	Untested	Untested	Untested	Yes	Yes	Yes	Yes	Yes	Untested	---