| .. |
|
llm_auto_parallel.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_eagle_decoding.py
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
llm_guided_decoding.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_inference_async_streaming.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_inference_async.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_inference_customize.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
llm_inference_distributed.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_inference_kv_events.py
|
test: add kv cache event tests for disagg workers (#3602)
|
2025-04-18 18:30:19 +08:00 |
|
llm_inference.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
llm_logits_processor.py
|
fix: LLM API logits processor example comments (#2962)
|
2025-03-24 12:22:12 +08:00 |
|
llm_lookahead_decoding.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
llm_medusa_decoding.py
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
llm_mgmn_llm_distributed.sh
|
make LLM-API slurm examples executable (#3402)
|
2025-04-13 21:42:45 +08:00 |
|
llm_mgmn_trtllm_bench.sh
|
make LLM-API slurm examples executable (#3402)
|
2025-04-13 21:42:45 +08:00 |
|
llm_mgmn_trtllm_serve.sh
|
make LLM-API slurm examples executable (#3402)
|
2025-04-13 21:42:45 +08:00 |
|
llm_multilora.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
llm_quantization.py
|
feat: use cudaMalloc to allocate kvCache (#3303)
|
2025-04-08 10:59:14 +08:00 |
|
quickstart_example.py
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
README.md
|
docs:update llm api examples and customizations sections' links. (#3566)
|
2025-04-15 13:55:22 +08:00 |