TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yanchao Lu	b4b1185af3	[https://nvbugs/5450855 ][fix] Cherry pick #6700 and #6702 from main (#6808 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-12 18:11:47 +08:00
Ivy Zhang	94de3c11b0	tests: Add llama4 functional cases (#6392 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-29 17:49:43 +10:00
brb-nv	eb157accac	test: Relax Gemma3 unit test thresholds (#6016 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-28 21:24:34 +08:00
Pengyun Lin	ab4e178bef	[fix]: Revert commit `388b491` (#6143 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-18 17:38:13 +08:00
pcastonguay	4d0bcbcb2d	fix: Fix triton backend build [nvbug 5396469] (#6098 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-16 16:30:16 -04:00
Yiqing Yan	69a15c8c74	[None] - Waive L0 tests (#6082 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-16 13:14:16 +08:00
Yi Zhang	332a65b837	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 10:06:29 +08:00
Fanrong Li	4905cac8fd	[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (#5947 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-07-12 21:55:44 +08:00
Zheng Duan	e831673f80	fix: timeout and broken pipe in disagg and worker tests (#5827 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-11 12:42:47 +08:00
Nikita Korobov	aeea5b3a56	fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-07-10 15:44:19 +02:00
Yan Chunwei	bfa917ff9b	fix [nvbug/5351244]: address remote mpi session submit (#5664 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-10 21:22:41 +09:00
Bo Li	8b7422c5b7	fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-10 19:16:38 +08:00
amirkl94	cd7aeec061	tests: Fix lora perf test (#5875 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-07-10 10:56:46 +02:00
brb-nv	ff9aabb038	test: Add Gemma3 unit tests to CI in release/0.21 (#5899 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 09:47:49 +02:00
Robin Kobus	fd94d3cbf5	[nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-09 17:59:45 +02:00
Pengyun Lin	2e21e3421f	[nvbug 5327706][fix] fix mgmn postprocess error (#5835 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-09 17:08:03 +08:00
ruodil	cbcc55e073	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-09 15:36:22 +10:00
Bo Li	6d7a2cb1c5	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-08 18:12:48 +08:00
QI JUN	f8b4077654	[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP (#5789 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-08 15:39:27 +09:00
Bo Li	6062dc675f	fix: [https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. (#5606 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-08 13:11:08 +09:00
Yan Chunwei	97f4c9e24f	[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-07 22:13:34 +08:00
Pengyun Lin	0a0ac7b5dc	[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-07 19:26:13 +08:00
QI JUN	d47ac4e3e5	cherry pick #5416 (#5776 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-07 17:19:38 +08:00
QI JUN	4fa9284612	[nvbug/5302638][nvbugs/5310314] fix _handle_cancelled_requests (#5532 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-07 16:51:24 +08:00
QI JUN	3a58db88c8	fix _pad_attention_dp_dummy_request (#5583 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-07 14:13:54 +08:00
Pengyun Lin	7524c77e1e	[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-07 15:06:49 +09:00
brb-nv	9106b5d9a5	fix: Skip rope scaling for local layers in Gemma3 VLM (#5773 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-07 13:36:23 +08:00
ruodil	6103466de2	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-07 13:11:41 +10:00
Iman Tabrizian	518915b5c6	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-04 12:52:35 -04:00
Yi Zhang	5ac92bb8ff	[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 23:23:41 +09:00
Yiqing Yan	3e44db11c9	[Infra][nvbugs/5370968] - Unwaive l0 test (#5750 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 15:27:53 +08:00
Yi Zhang	53394e0030	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 13:26:53 +09:00
brb-nv	2b66fe8fbd	[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-04 10:55:34 +08:00
Faraz	8a8d2e9901	[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5651 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>	2025-07-03 22:08:15 +09:00
Emma Qiao	2f9d0619c3	[Infra] - Waive failed cases on release/0.21 (#5674 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 22:23:54 -04:00
brb-nv	a3c0cf02ce	fix: Investigate Gemma3 1B decoder output discrepancy (#5564 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-03 09:55:25 +08:00
bhsueh_NV	d5606b062a	fix: [https://nvbugs/5355219 ] Fix bug of Qwen3 235B CI on dgx_gb200 (#5602 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-02 10:07:01 +08:00
Yi Zhang	aa0b9278d2	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-01 01:06:47 -04:00
Zheng Duan	1824c44004	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-01 10:48:25 +08:00
nv-guomingz	9fe1dd6be1	fix:https://nvbugs/5362398 (#5609 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 13:29:40 -04:00
Yan Chunwei	d6c81bad97	fix [nvbug5351244]: test_mpi_session submit sync/async (#5608 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 00:48:59 +08:00
Venky	4fc0666daa	[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5553 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-28 01:15:04 +08:00
Yan Chunwei	b78ad754c8	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-27 14:10:45 +08:00
Emma Qiao	e2054bb2aa	[Infra][release/0.21] - waive failed tests (#5537 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-27 13:58:13 +08:00
Yan Chunwei	87ead4ecbe	[nvbug 5273941] fix: broken cyclic reference detect (#5417 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-26 07:35:35 +08:00
Emma Qiao	b6d23d58c4	[Infra] - Waive failed tests on release/0.21 (#5477 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-25 19:01:55 +08:00
HuiGao-NV	5cd87bee41	tests: Set kv cache free memory fraction in test case (#5462 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 16:27:46 +08:00
ruodil	5e50fcc51b	test: set enable_attention_dp=True in default deepseek settings (#5461 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-25 14:21:14 +08:00
brb-nv	32f50ded17	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-25 11:45:14 +08:00
Ivy Zhang	9e110b2d11	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-25 10:42:34 +08:00

1 2 3 4 5 ...

809 Commits