TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Robin Kobus	5f77d212ef	test: Reduce number of C++ test cases (#5437 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-01 09:40:49 +02:00
xinhe-nv	19c56f0374	test: [CI] Add failed cases into waives.txt (#5582 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 14:57:03 +08:00
Stanley Sun	7135b27284	rcca: test default kv_cache_reuse option for pytorch multimodal (#5544 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-01 12:12:48 +08:00
xinhe-nv	a8cf611baa	test: [CI] Add failed cases into waives.txt (#5569 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 11:02:56 +08:00
xinhe-nv	9b17b29b6e	test: [CI] remove closed bugs (#5572 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 10:15:43 +08:00
Yi Zhang	7cf1209a19	[fix]: Fix main test skip issue (#5503 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-30 21:39:49 -04:00
nv-guomingz	6e48ac25a6	chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 12:23:14 -04:00
Omer Ullman Argov	42134b8b84	[ci] move eagle1 and medusa tests to post-merge (#5604 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 19:32:28 +08:00
Fanrong Li	6cbc9a5297	[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-30 15:59:12 +08:00
Yiqing Yan	4fef14da56	Deduplicate waive list (#5546 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-30 11:12:26 +08:00
nv-guomingz	578430e64c	[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 11:05:40 +08:00
Omer Ullman Argov	2780fc27a7	[ci] remove MMLU if followed by GSM8K (#5578 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 05:29:54 +03:00
Talor Abramovich	70e34a3291	[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376 ) Signed-off-by: Talor Abramovich <talora@nvidia.com>	2025-06-29 12:46:30 +00:00
amirkl94	a985c0b7e6	tests: Move stress tests to be Post-Merge only (#5166 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-06-29 09:44:47 +03:00
Li Min	6021a439ab	Make moe permute and final as custom op (#5412 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-06-27 15:48:33 -07:00
Iman Tabrizian	26b953e29a	[nvbugs/5309940] Add support for input output token counts (#5445 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-28 04:39:39 +08:00
Darragh Hanley	5437075def	ReDrafter support for Qwen (#4875 ) Signed-off-by: darraghdog <darragh.hanley@gmail.com> Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com> Co-authored-by: rakib-hasan <rhasan@nvidia.com>	2025-06-28 02:33:10 +08:00
wili	56cdfe5c6c	[TRTLLM-5000][feat] NGrams V2 (#4569 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-06-27 23:00:17 +08:00
Omer Ullman Argov	6fc1c6fd7b	[fix][ci] correct unittests test prefix (#5547 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-27 20:34:44 +08:00
Iman Tabrizian	49af791f66	Add testing for trtllm-llmapi-launch with tritonserver (#5528 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-27 11:19:52 +08:00
xinhe-nv	a3494bebec	tests: waive failed tests on main (#5512 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-27 10:13:22 +08:00
Frank	aa6e015ef8	Update trtllm-bench to support new Pytorch default. (#5491 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-06-26 17:05:43 -07:00
jmydurant	8836990bde	[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 22:18:08 +08:00
Omer Ullman Argov	6bae76d7ca	[fix][ci] move torch tests to run under torch stage (#5473 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 14:31:38 +03:00
Omer Ullman Argov	1633bd2bef	[CI] move flashinfer llama tests to post merge (#5506 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 19:27:32 +08:00
xinhe-nv	ff2dd72df4	tests: waive tests (#5458 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-26 14:53:55 +08:00
Emma Qiao	32d1573c43	[Infra] - Add timeout setting for long tests found in post-merge (#5501 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-26 11:31:39 +08:00
Venky	d9b75f83fd	[CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5494 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-25 20:17:12 -07:00
jmydurant	578dbc8d9a	feat: chunked prefill for MLA (Blackwell) (#4651 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 09:01:00 +08:00
HuiGao-NV	74ae15a26b	CI: enable test cases on single device type (#5484 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-26 08:03:44 +08:00
QI JUN	feaf789342	CI: reduce BF16 test cases in B200 (#5482 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-06-26 07:18:20 +08:00
Omer Ullman Argov	bdc8dfebc3	[fix][ci] dont build wheel for cpp tests (#5443 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 00:13:47 +03:00
Daniel Cámpora	205c97a4ae	[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-25 17:41:36 +02:00
HuiGao-NV	cc3c2b3be2	Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 21:38:14 +08:00
Kaiyu Xie	d6ada5ffce	[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-25 20:37:24 +08:00
Netanel Haber	3ca2f6ac51	start OAIServer with `max_beam_width=1` for TorchSampler (#5427 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-06-25 15:52:06 +08:00
Enwei Zhu	fc7a81ceb0	test: Add LLGuidance test and refine guided decoding (#5348 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 14:12:56 +08:00
Enwei Zhu	76da7fed86	fix (NvBug 5354925): Fix static EPLB (#5411 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 13:14:40 +08:00
HuiGao-NV	da98e03747	tests: Set kv cache free memory fraction in test case (#5433 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 12:31:58 +08:00
dongxuy04	699520082b	Add MTP support for Online EPLB (#5213 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-06-25 07:58:13 +08:00
Emma Qiao	475272046a	[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-24 17:19:31 +08:00
xinhe-nv	658fb5b54e	tests: update benchmark test lists (#5365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 15:23:38 +08:00
xinhe-nv	4b32a3f1a7	test: [CI] remove closed bugs (#5400 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 13:39:57 +08:00
Fanrong Li	5d4ab47d5b	fix: refactor and fix mtp vanilla (#4762 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-20 05:23:39 +08:00
Yan Chunwei	9bd42ecf9b	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-20 03:01:10 +08:00
Kaiyu Xie	7246fd75d1	feat: Support stream_interval (#5284 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-19 21:57:10 +08:00
Enwei Zhu	bca758fce1	fix: Fix DS-R1 nvfp4 test case naming (#5361 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-19 15:50:43 +08:00
Emma Qiao	493f268b1c	[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:05:57 +08:00
ruodil	e22e884b02	test: amend test case name in perf cluster test (#5356 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:50:12 +08:00
ruodil	21ce9b6749	test: add qwen3 cases (#5302 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-19 14:38:36 +08:00

1 2 3 4 5 ...

520 Commits