mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-26 21:53:30 +08:00
* add passing E2E LoRA flow Signed-off-by: Shahar Mor <smor@nvidia.com> * add experimental feature Signed-off-by: Shahar Mor <smor@nvidia.com> * fix llma_args definition Signed-off-by: Shahar Mor <smor@nvidia.com> * decreased manually size of max loras to address OOM Signed-off-by: Shahar Mor <smor@nvidia.com> --------- Signed-off-by: Shahar Mor <smor@nvidia.com> |
||
|---|---|---|
| .. | ||
| batched_logits_processor.yaml | ||
| calib_config.yaml | ||
| completion_output.yaml | ||
| guided_decoding_params.yaml | ||
| llm.yaml | ||
| logits_processor.yaml | ||
| quant_config.yaml | ||
| request_output.yaml | ||
| sampling_params.yaml | ||