Welcome to TensorRT-LLM’s documentation!
Contents:
- TensorRT-LLM Architecture
- C++ GPT Runtime
- The Batch Manager in TensorRT-LLM
- Multi-head, Multi-query and Group-query Attention
- Numerical Precision
- TensorRT-LLM Installation
- Performance of TensorRT-LLM
- How to debug
- How to add a new model
- Graph Rewriting Module
- Memory Usage of TensorRT-LLM
- New Workflow