TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

2ez4bz ccb62ef97e [TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-08-13 21:25:55 -04:00
..
accuracy	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 )	2025-08-13 21:25:55 -04:00
cpp	[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095 )	2025-07-19 00:48:44 +08:00
deterministic	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
disaggregated	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6737 )	2025-08-11 13:29:11 +08:00
examples	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 )	2025-08-05 16:39:25 +08:00
llmapi	[TRTLLM-4721][test] Add qa test for llm-api (#6727 )	2025-08-11 08:03:16 +08:00
perf	[https://nvbugs/5442608 ][fix] Update CUDA graph config for get_model_yaml_config. (#6693 )	2025-08-10 01:48:55 -04:00
stress_test	[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717 )	2025-07-21 21:09:43 +08:00
sysinfo	Update (#2978 )	2025-03-23 16:39:35 +08:00
triton_server	[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678 )	2025-08-03 11:18:59 +08:00
utils	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 )	2025-08-05 16:39:25 +08:00
__init__.py	[fix] Remove SpecConfig and fix thread leak issues (#5931 )	2025-07-12 21:03:24 +09:00
.test_durations	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 )	2025-08-01 07:38:06 -04:00
agg_unit_mem_df.csv	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
ci_profiler.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
common.py	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 )	2025-08-05 16:39:25 +08:00
conftest.py	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 )	2025-08-05 16:39:25 +08:00
local_venv.py	[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105 )	2025-07-17 11:04:17 -07:00
pytest.ini	test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115 )	2025-07-17 20:55:04 +10:00
runner_interface.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_cache.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
test_cases.yml	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_e2e.py	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 )	2025-08-13 13:10:13 +08:00
test_list_parser.py	[TRTLLM-4535][infra]: Add marker TIMEOUT for test level (#3905 )	2025-05-25 23:30:40 -07:00
test_list_validation.py	[Infra]Remove some old keyword (#4552 )	2025-05-31 13:50:45 +08:00
test_mlpf_results.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_sanity.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_unittests.py	[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277 )	2025-07-24 10:02:28 +08:00
trt_test_alternative.py	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 )	2025-08-05 18:27:43 +01:00