tensorrt_llm
Contents:
TensorRT-LLM Architecture
C++ GPT Runtime
The Batch Manager in TensorRT-LLM
Multi-head, Multi-query and Group-query Attention
Numerical Precision
Performance of TensorRT-LLM
Build From Sources
How to debug
How to add a new model
Graph Rewriting Module
Python API
Layers
Functionals
Models
Plugin
Qunatization
QuantMode
Runtime
C++ API
Runtime
tensorrt_llm
Qunatization
View page source
Qunatization
class
tensorrt_llm.quantization.
QuantMode
(
value
)
[source]
Bases:
IntFlag
An enumeration.