TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-26 13:43:38 +08:00

Author	SHA1	Message	Date
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00
Ivy Zhang	08ed9d7305	[None][doc] add introduction doc on qa test (#6535 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 17:02:17 +08:00
Ivy Zhang	d101a6cebc	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 16:39:25 +08:00
ruodil	7625845365	test: add README_release_test.md for perf test (#6443 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-05 02:07:42 -04:00
xinhe-nv	a178cea324	[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 12:47:53 +10:00
Ivy Zhang	f3651adea8	[None][test] update invalid test name (#6596 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 08:01:05 -04:00
brb-nv	87e4e9f468	[None][chore] Add unit test for Gemma3 lora (#6560 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 04:56:57 -04:00
ruodil	6459725bf9	test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:22:39 +10:00
Ivy Zhang	5eefdf2c75	tests: Add llama4 functional cases (#6392 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 11:19:58 +08:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
Ivy Zhang	7547a7d0a2	[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-03 22:11:26 -04:00
Jhao-Ting Chen	4da5cfc511	[None][infra] add eagle3 one model accuracy tests (#6264 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-02 16:07:46 -07:00
Lizhi Zhou	6f34f3489b	[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-01 13:33:34 -04:00
brb-nv	7447d6ed85	[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-01 09:19:54 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
ruodil	e11255e9d0	test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-29 15:52:45 +10:00
Michal Guzek	2573bb729d	feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-28 14:02:14 -07:00
ruodil	03632a679f	test: organize perf cases and add missing perflab cases in qa test list (#6283 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-28 20:33:32 +10:00
xinhe-nv	470544cf17	test: [CI] Add failed cases into waives.txt (#6333 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-25 17:18:06 +10:00
xinhe-nv	6268a60ab3	tests: add test_chunked_prefill for llama4 (#5549 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 23:02:00 -04:00
Iman Tabrizian	5fceaa6153	Revert "tests: add timeout_manager to tensorrt flow test cases (#5942 )" (#6309 )	2025-07-23 23:58:10 -04:00
Stanley Sun	04f2d4b2eb	test: update test list for RTX6KD (#6213 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-22 18:55:24 +08:00
Ivy Zhang	eb5cb5b642	tests: add timeout_manager to tensorrt flow test cases (#5942 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-22 10:23:41 +08:00
ruodil	6a3c9f8061	test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-21 11:29:19 +10:00
wili	82d3587bb8	[refactor] Unify name of NGram speculative decoding (#5937 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-19 12:59:57 +08:00
Bo Deng	2c6fa145ee	[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-19 00:48:44 +08:00
Chuang Zhu	44c70c88f9	chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-07-17 17:42:07 +08:00
chenfeiz0326	fe070a0168	test: Update Llama4 Scout FP4 & FP8 accuracy tests (#5901 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-17 09:41:18 +08:00
Wanli Jiang	2d2b8bae32	feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-17 06:30:58 +08:00
Ivy Zhang	dda91b5117	tests: add QA test cases (#5959 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:14:25 +08:00
Ivy Zhang	763012a88a	[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill (#6051 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:04:08 +08:00
Wanli Jiang	8679a058a3	fix: Unable to load phi4-model with tp_size>1 (#5962 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-16 11:39:41 +08:00
ruodil	2a147c4d01	test: add llama_v3.3_70b_cases in perf test (#6035 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-15 17:53:59 +10:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
ruodil	347520494b	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	278a1a7df3	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
xinhe-nv	509363d858	tests: update sanity tests & fix tests (#5906 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-11 19:48:19 +10:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
xinhe-nv	ff2dd72df4	tests: waive tests (#5458 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-26 14:53:55 +08:00
Enwei Zhu	fc7a81ceb0	test: Add LLGuidance test and refine guided decoding (#5348 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 14:12:56 +08:00
xinhe-nv	658fb5b54e	tests: update benchmark test lists (#5365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 15:23:38 +08:00

1 2 3 4

173 Commits