TensorRT-LLMs/tensorrt_llm/commands
2025-07-25 18:10:40 -04:00
..
__init__.py Update TensorRT-LLM (#613) 2023-12-08 17:49:24 +08:00
bench.py Update TensorRT-LLM (#2849) 2025-03-04 18:44:00 +08:00
build.py feat: nanobind bindings (#6185) 2025-07-21 08:56:57 +01:00
eval.py [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
prune.py Update TensorRT-LLM (#2008) 2024-07-23 23:05:09 +08:00
refit.py Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
serve.py [nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974) 2025-07-25 18:10:40 -04:00