mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Guoming Zhang f53fb4c803 [TRTLLM-5930][doc] 1.0 Documentation. (#6696 )

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

2025-09-09 12:16:03 +08:00

5.5 KiB

Raw Blame History

Feature Combination Matrix

Feature	Overlap Scheduler	CUDA Graph	Attention Data Parallelism	Disaggregated Serving	Chunked Prefill	MTP	EAGLE-3(One Model Engine)	EAGLE-3(Two Model Engine)	Torch Sampler	TLLM C++ Sampler	KV Cache Reuse	Slide Window Attention	Logits Post Processor	Guided Decoding	LoRA
Overlap Scheduler	---
CUDA Graph	Yes	---
Attention Data Parallelism	Yes	Yes	---
Disaggregated Serving	Yes	Yes	Yes	---
Chunked Prefill	Yes	Yes	Yes	Yes	---
MTP	Yes	Yes	Yes	Yes	Yes	---
EAGLE-3(One Model Engine)	Yes	Yes	Yes	Yes	Yes	No	---
EAGLE-3(Two Model Engine)	No	Yes	Yes	Yes	Yes	No	No	---
Torch Sampler	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
TLLM C++ Sampler	Yes	Yes	Yes	Yes	Yes	No	No	No	No	---
KV Cache Reuse	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	---
Slide Window Attention	Yes	Yes	Yes	Yes	Yes	No	Untested	Untested	Yes	Yes	WIP	---
Logits Post Processor	Yes	Yes	Yes	No	Yes	No	No	No	Yes	Yes	Yes	Yes	---
Guided Decoding	Yes	Yes	Yes	Yes	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	---
LoRA	Yes	No	Untested	Untested	Untested	Untested	Untested	Untested	Yes	Yes	Yes	Yes	Yes	Untested	---

5.5 KiB Raw Blame History

Feature Combination Matrix

5.5 KiB

Raw Blame History