mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-07-01 16:50:20 +00:00
ad8207af77
* cuda : enable CUDA graphs for MMID BS <= 4 * cont : add stream capture check Co-authored-by: Oliver Simons <osimons@nvidia.com> * cont : add MMVQ_MMID_MAX_BATCH_SIZE --------- Co-authored-by: Oliver Simons <osimons@nvidia.com>