TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
dominicshanshan	3c0fecbf42	CI: extend model weights load time for dsv3 in stress test. (#5275 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-06-18 11:51:48 +08:00
Ivy Zhang	41cfcaa964	test: update qa test list (#5305 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-18 11:29:11 +08:00
Emma Qiao	ff32caf4d7	[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 23:48:34 +08:00
QI JUN	f899c4d294	Re-implement LlmResponse in Python to reduce host overhead of pybind (#5224 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 21:28:09 +08:00
Yanchao Lu	f4cdbfcdf0	None - Some clean-ups for the automation pipeline (#5245 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 21:08:24 +08:00
Dom Brown	44fb3c1673	[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Pytorch workflow kernel autotuner (#5207 ) - Adds a new Python custom op (fp8_block_scale_moe_runner) and a FP8BlockScaleMoERunner class for autotuning. - Updates C++ MoE and batched GEMM kernels to accept a configIndex for workspace sizing and execution. - Extends the unit test to run both autotuned and non-autotuned code paths. Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-06-17 21:01:56 +08:00
amirkl94	8451a87742	chore: Mass integration of release/0.20 (#5082 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 14:32:02 +03:00
liji-nv	13eef642e6	[feat] Piecewise cuda graph support for MLA (#4467 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-17 18:58:38 +08:00
QI JUN	ccd9adbe33	CI: move multi-gpu test cases of tensorrt backend to h200 (#5272 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:37:37 +08:00
Ivy Zhang	2ad8758ecc	[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases (#5073 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 17:14:01 +08:00
QI JUN	517c1ecf72	move some test cases of TensorRT backend back (#5232 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:03:11 +08:00
qsang-nv	134cb66a53	fix mla test (#5240 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>	2025-06-17 15:26:25 +08:00
xinhe-nv	a49ad790b3	test: [CI] remove closed bugs (#5218 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-17 13:13:23 +08:00
QI JUN	546274d40e	fix ci (#5259 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 12:03:09 +08:00
ruodil	bb2348372c	test: add more pytorch cases in perf test (#5237 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-17 11:11:28 +08:00
Mike Iovine	c53bc19f5e	[infra] Make test_chunked_prefill faster (#5248 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-17 04:19:47 +08:00
Simeng Liu	5c18160d27	chore: Waive CI failure. (#5252 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-16 20:47:05 +02:00
Izzy Putterman	e607768e45	Speculation: Draft Target in new FW (#4558 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-06-17 02:26:08 +08:00
Yilin Fan	dd29063538	[feat] Add llm args to tune python gc threshold (#5141 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-06-16 17:45:22 +08:00
Ivy Zhang	64b7f04fdc	[test] split nemotron test cases from examples_test_list (#5238 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-16 16:36:33 +08:00
xinhe-nv	802f22cd12	test: [CI] Add failed cases into waives.txt (#5221 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-16 16:11:53 +08:00
Yiqing Yan	8445416c39	Waive L0 tests (#5233 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-16 15:19:03 +08:00
Anthony Chang	4f9fa9f21d	feat: MoE trtllm backend kernel update (#5183 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-06-16 14:46:13 +08:00
Wanli Jiang	0acf23185e	[Stress test] Add DeepSeek-R1 stress test (#5033 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-16 11:54:31 +08:00
Tracin	ef3fdc8051	feat: Add w4a8_mxfp4_fp8 quantization recipe. (#4867 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-06-16 11:30:57 +08:00
Yi Zhang	9b616db13b	test: Add fixture to skip tests based on MPI world size (#5028 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-16 11:25:01 +08:00
ruodil	2848e012ae	test: add llama4 models for perf test (#5187 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-16 11:24:35 +08:00
ruodil	3d22f27063	test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-16 11:23:20 +08:00
Enwei Zhu	babdd9ce06	test: Add json_mode_eval for guided decoding evaluation (#5179 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-16 10:03:55 +08:00
Yan Chunwei	c84e41fd9d	fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-15 17:51:56 -07:00
amitz-nv	109c426077	Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130 )	2025-06-15 18:54:04 +03:00
Omer Ullman Argov	4eade3ae33	[fix][test] Speedup Nemotron NAS unittests (#5202 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-15 11:26:03 +03:00
Kaiyu Xie	dce1dcc4f9	feat: Support post_proc for bench (#5122 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-15 13:02:38 +08:00
ixlmar	e055af1bc9	chore: improve disagg test failure detection (#4738 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-15 01:28:26 +08:00
Aurelien Chartier	1389f5a4d3	feat: Add support for fp8 rowwise quantization (#4876 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>	2025-06-14 06:37:48 -07:00
Tailing Yuan	0b60da2c45	feat: large-scale EP(part 7: DeepEP integration) (#4792 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-14 19:12:38 +08:00
yunruis	b99c5ce8c1	Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560 ) Signed-off-by: yunruis <yunruis@nvidia.com> Signed-off-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com> Signed-off-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com> Co-authored-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>	2025-06-14 17:36:22 +08:00
nv-guomingz	3b7b5a5ad5	refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-14 14:23:13 +08:00
Enwei Zhu	5f2785fb90	fix: Fix waive list (#5205 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-13 23:33:23 +08:00
Mike Iovine	25aa3881d7	[nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len (#4879 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-13 11:06:36 -04:00
QI JUN	952f33dcad	CI: move all test cases of TensorRT backend into post merge (#5186 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-13 20:48:48 +08:00
xinhe-nv	30d9d0fa71	test: [CI] Add failed cases into waives.txt (#5178 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 16:38:51 +08:00
Zheng Duan	4d0a5ad384	chore: gracefully exit disagg process in tests; better startup and logging (#5109 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-13 14:03:55 +08:00
Ivy Zhang	28cd536bd6	[test] Update timeout params in QA test list (#5124 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-13 13:40:03 +08:00
Iman Tabrizian	01bd4c00b4	Add two MTP disaggregated test (#4546 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-13 12:17:45 +08:00
Daniel Cámpora	dec326ba7d	[fix] Reenable test return logits (#5160 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-13 06:07:22 +02:00
Yibin Li	b79eb34bfe	[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-06-13 11:37:50 +08:00
xinhe-nv	d9be419f45	tests: update tests for b200 (#5180 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 11:25:33 +08:00
ruodil	fa582cbe9a	test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-13 11:09:15 +08:00
Yuxian Qiu	4ae46b6714	fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor. (#4930 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-13 10:21:32 +08:00

1 2 3 4 5 ...

735 Commits