TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 11:11:36 +08:00

Author	SHA1	Message	Date
Ivy Zhang	61213e3562	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
xinhe-nv	19c56f0374	test: [CI] Add failed cases into waives.txt (#5582 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 14:57:03 +08:00
Yi Zhang	7cf1209a19	[fix]: Fix main test skip issue (#5503 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-30 21:39:49 -04:00
nv-guomingz	6e48ac25a6	chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 12:23:14 -04:00
nv-guomingz	578430e64c	[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 11:05:40 +08:00
Omer Ullman Argov	2780fc27a7	[ci] remove MMLU if followed by GSM8K (#5578 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 05:29:54 +03:00
Li Min	6021a439ab	Make moe permute and final as custom op (#5412 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-06-27 15:48:33 -07:00
jmydurant	8836990bde	[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 22:18:08 +08:00
jmydurant	578dbc8d9a	feat: chunked prefill for MLA (Blackwell) (#4651 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 09:01:00 +08:00
Kaiyu Xie	d6ada5ffce	[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-25 20:37:24 +08:00
Enwei Zhu	fc7a81ceb0	test: Add LLGuidance test and refine guided decoding (#5348 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 14:12:56 +08:00
HuiGao-NV	da98e03747	tests: Set kv cache free memory fraction in test case (#5433 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 12:31:58 +08:00
dongxuy04	699520082b	Add MTP support for Online EPLB (#5213 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-06-25 07:58:13 +08:00
xinhe-nv	658fb5b54e	tests: update benchmark test lists (#5365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 15:23:38 +08:00
Fanrong Li	5d4ab47d5b	fix: refactor and fix mtp vanilla (#4762 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-20 05:23:39 +08:00
Yan Chunwei	9bd42ecf9b	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-20 03:01:10 +08:00
Kaiyu Xie	7246fd75d1	feat: Support stream_interval (#5284 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-19 21:57:10 +08:00
bhsueh_NV	dce8620013	chore: enable moe_backend on Qwen3 test (#5230 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-06-19 13:40:45 +08:00
xinhe-nv	e5400eeae0	tests: add ds r1 tp4 test (#5197 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-19 12:48:33 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
nv-guomingz	6a388b105a	chore: remove torch_compile prefix for TorchCompileConfig field members (#5261 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-19 09:21:51 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Wanli Jiang	3a02489e86	[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-18 15:12:49 +08:00
QI JUN	9ea7bb67a4	CI: fix TensorRT H200 tests (#5301 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 14:40:57 +08:00
liji-nv	13eef642e6	[feat] Piecewise cuda graph support for MLA (#4467 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-17 18:58:38 +08:00
Ivy Zhang	2ad8758ecc	[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases (#5073 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 17:14:01 +08:00
Mike Iovine	c53bc19f5e	[infra] Make test_chunked_prefill faster (#5248 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-17 04:19:47 +08:00
Izzy Putterman	e607768e45	Speculation: Draft Target in new FW (#4558 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-06-17 02:26:08 +08:00
Yi Zhang	9b616db13b	test: Add fixture to skip tests based on MPI world size (#5028 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-16 11:25:01 +08:00
Enwei Zhu	babdd9ce06	test: Add json_mode_eval for guided decoding evaluation (#5179 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-16 10:03:55 +08:00
nv-guomingz	3b7b5a5ad5	refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-14 14:23:13 +08:00
xinhe-nv	d9be419f45	tests: update tests for b200 (#5180 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 11:25:33 +08:00
Michal Guzek	53983ad273	[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 15:06:28 +08:00
bhsueh_NV	505678a286	update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114 ) Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>	2025-06-12 14:40:57 +08:00
Michal Guzek	0daa70999a	Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#4961 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 14:32:04 +08:00
xinhe-nv	11b94feff8	test: skip disaggregated tests on arm (#5070 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-11 17:00:10 +08:00
Stanley Sun	74b0e71ef4	test: add more disaggregated serving tests into QA testlist (#5036 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-10 09:24:53 +08:00
liji-nv	1d4f748773	[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-09 17:50:57 +08:00
Omer Ullman Argov	8731f5f14f	chore: Mass integration of release/0.20 (#4898 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>	2025-06-08 23:26:26 +08:00
Ivy Zhang	7dce328ad6	[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com>	2025-06-07 11:18:32 +08:00
Fanrong Li	75d020cf07	fix: fix cuda graph padding for spec decoding (#4853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-06 22:21:42 +08:00
Anthony Chang	eeb555e37b	chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-06-06 16:13:54 +08:00
ixlmar	a1526356aa	[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests (#4859 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-05 10:46:29 +01:00
Yi Zhang	1fca654bfd	tests: Update gb200 test case (#4754 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-04 18:49:20 +08:00
Nikita Korobov	8043d7a03c	feat: update DeepSeek FP8 TRT-LLM Gen cubins (#4643 ) Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>	2025-06-03 14:07:54 -07:00
Robin Kobus	b9263a8e10	fix: max_num_sequences calculation with overlap scheduling (#4532 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-03 09:31:22 +02:00
Fanrong Li	380a5d1690	[https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue (#4536 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 10:03:34 +08:00
Jhao-Ting Chen	fcadce9f8d	[fix] Eagle-2 LLMAPI pybind argument fix. (#3967 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-29 12:23:25 -07:00

1 2 3

122 Commits