TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

shaharmor98 49262a62a5 add passing E2E LoRA flow (#3788 ) add passing E2E LoRA flow (#3788) Signed-off-by: Shahar Mor <smor@nvidia.com>		2025-04-23 18:38:06 +03:00
..
__init__.py	chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025 )	2025-04-05 13:31:48 +08:00
_perf_evaluator.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
build_cache.py	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
disagg_utils.py	feat: Disaggregated router class (#3584 )	2025-04-19 00:34:12 +08:00
llm_args.py	Add smart router for moe (#3641 )	2025-04-23 12:21:59 +08:00
llm_utils.py	fix: LLM API _hf_model_dir for non-cached case (#3562 )	2025-04-16 10:39:34 +08:00
llm.py	add passing E2E LoRA flow (#3788 )	2025-04-23 18:38:06 +03:00
mgmn_leader_node.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
mgmn_worker_node.py	Update TensorRT-LLM (#2333 )	2024-10-15 15:28:40 +08:00
mpi_session.py	fix hmac in remote mpi session (#3649 )	2025-04-18 17:47:51 +08:00
tokenizer.py	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 )	2025-04-22 07:38:16 +08:00
tracer.py	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
trtllm-llmapi-launch	fix hmac in remote mpi session (#3649 )	2025-04-18 17:47:51 +08:00
utils.py	chore: Unify Python NVTX call (#3450 )	2025-04-15 23:25:36 +08:00