TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Pengyun Lin	3fcaa8a310	[nvbug 5327706][fix] fix mgmn postprocess error (#5835 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	347520494b	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	6d79559f3e	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	2991cf4b80	fix: [https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. (#5606 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yan Chunwei	3e1fd983c3	[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Pengyun Lin	388b4919b8	[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Pengyun Lin	6992616c1f	[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	278a1a7df3	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Iman Tabrizian	c8874a7f94	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	9cc4e5d50e	[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	e5e87ecf34	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
brb-nv	869e88304a	[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
dominicshanshan	c9e7f831dc	Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default (#5480 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-07-14 16:42:23 +08:00
Yan Chunwei	9c673e9707	[TRTLLM-6160] chore: add sampling examples for pytorch (#5951 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 15:28:32 +09:00
Yan Chunwei	c30eead09f	[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch (#5956 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 14:09:39 +08:00
QI JUN	ce39409530	fix cancel request logic (#5800 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-14 10:23:20 +08:00
wili	3dfc819849	[BUG5374319][fix] WAR for draft-target-model unit tests error (#5958 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-12 23:48:57 +09:00
Mike Iovine	8950223f6f	[fix] Remove SpecConfig and fix thread leak issues (#5931 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-12 21:03:24 +09:00
Enwei Zhu	bc1d4fb5da	[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) (#5902 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-12 15:50:31 +09:00
Chang Liu	308776442a	[nvbug/5308432] fix: extend triton exit time for test_llava (#5971 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-12 12:56:37 +09:00
Thor Johnsen	041f1fa513	[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora (#5885 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>	2025-07-11 16:20:41 -07:00
xinhe-nv	509363d858	tests: update sanity tests & fix tests (#5906 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-11 19:48:19 +10:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
wili	2e3cf42e03	[refactor] Simplification of Speculative decoding configs (#5639 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-10 11:37:30 -04:00
Yiqing Yan	3aa53ec36c	[None] - Waive L0 tests (#5915 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-10 18:33:17 +08:00
Enwei Zhu	055c4a9fe6	[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-10 16:30:00 +08:00
CarstyYou	dc32f9ae73	[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-07-10 15:16:18 +08:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
Yan Chunwei	07f6da763d	[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-10 11:31:35 +08:00
Venky	f57b3d6829	Waive unittest failures introduced by PR#5345 (removal of `ScaffoldingOutput` class) (#5886 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-10 09:53:31 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
brb-nv	3209b31665	feat: Custom masking utils for Gemma3 VLM (#5853 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 06:18:04 +09:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
Chang Liu	b61a717275	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
Wanli Jiang	3f7cedec7c	Update transformers to 4.53.0 (#5747 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:32:24 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
Omer Ullman Argov	a32f7083b4	[ci] parallelize torch unittests (#5714 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 11:05:57 +03:00
Dom Brown	3e3b1769ad	[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 08:21:58 +01:00
Erin	e277766f0d	chores: merge examples for v1.0 doc (#5736 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-08 21:00:42 -07:00
Lucas Liebenwein	d14dd2f597	[AutoDeploy] re-enable waive for flaky AD test (#5867 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-09 11:47:48 +09:00
Bo Li	9d894bc0cb	fix: [https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. (#5842 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-09 10:17:05 +08:00
brb-nv	2bd09ed2d4	fix: Skip rope scaling for local layers in Gemma3 VLM (#5857 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-09 10:10:33 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00
Fridah-nv	a79b73f577	fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test (#5855 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-07-09 09:13:31 +09:00
Yan Chunwei	e50d95c40d	chore [TRTLLM-6161]: add LLM speculative decoding example (#5706 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-09 07:33:11 +08:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
Chang Liu	08a3dfeb2b	[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814 )	2025-07-08 09:53:11 -07:00
Dom Brown	e3ccca06e1	test: reduce redundant test cases for TRTLLM Gen FP8 MoE (#5845 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 00:40:33 +09:00

1 2 3 4 5 ...

949 Commits