TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 20:23:08 +08:00

History

Zheng Duan c9e2a963e0 feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>		2025-05-12 07:23:57 -04:00
..
_llmapi_perf_evaluator	Update (#2978 )	2025-03-23 16:39:35 +08:00
accuracy	[https://nvbugspro.nvidia.com/bug/5270564 ][test] skip per-hopper for llama4 (#4211 )	2025-05-12 15:27:15 +08:00
cpp	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 )	2025-05-09 19:16:51 +01:00
deterministic	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
disaggregated	feat: add kv cache aware router (#3831 )	2025-05-12 07:23:57 -04:00
examples	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 )	2025-05-12 14:13:40 +08:00
llmapi	chore: refactor llmapi e2e tests (#3803 )	2025-05-05 07:37:24 +08:00
perf	test: add llama_3.2_1B model and fix for test lora script issue (#4139 )	2025-05-12 14:51:59 +08:00
stress_test	fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836 )	2025-05-06 16:52:30 +08:00
sysinfo	Update (#2978 )	2025-03-23 16:39:35 +08:00
__init__.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
.test_durations	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 )	2025-05-09 19:16:51 +01:00
agg_unit_mem_df.csv	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
ci_profiler.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
common.py	[TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py (#3802 )	2025-04-23 23:05:13 +08:00
conftest.py	chore: Deprecate evaltool (#4173 )	2025-05-09 20:31:53 +08:00
local_venv.py	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 )	2025-05-12 14:13:40 +08:00
pytest.ini	chore: Refine attention backend interface. (#3271 )	2025-04-09 02:34:53 +08:00
runner_interface.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_cache.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
test_cases.yml	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_e2e.py	feat: Support the Structural Tag in guided decoding (#4066 )	2025-05-12 17:24:50 +08:00
test_list_parser.py	infra: Add test list name check (#3097 )	2025-04-20 23:02:16 +08:00
test_list_validation.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_mlpf_results.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_sanity.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_unittests.py	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
trt_test_alternative.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00