TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-26 21:53:30 +08:00

Author	SHA1	Message	Date
Mike Iovine	ec0d984656	[nvbug/5280806][fix] Fix 2 model spec decode flow (#4807 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-08 07:40:02 -04:00
Yanchao Lu	9e05613679	[Infra] - Update JNLP container config (#5008 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-08 16:44:09 +08:00
QI JUN	5ee0de7f2a	Resubmit #4894 (#4969 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-08 04:42:15 +08:00
Ivy Zhang	7dce328ad6	[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com>	2025-06-07 11:18:32 +08:00
Fanrong Li	75d020cf07	fix: fix cuda graph padding for spec decoding (#4853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-06 22:21:42 +08:00
Anthony Chang	eeb555e37b	chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-06-06 16:13:54 +08:00
xinhe-nv	564472168e	test: [CI] Add failed cases into waives.txt (#4966 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-06 10:30:15 +08:00
QI JUN	ec50684d80	Revert "fix a bug of global cuda graph dummy request" (#4970 )	2025-06-06 08:54:45 +08:00
QI JUN	154f7cc40a	fix a bug of global cuda graph dummy request (#4894 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-05 19:47:40 +08:00
Yiqing Yan	7e921c78b5	Waive L0 tests (#4953 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 19:36:48 +08:00
Shunkangz	3eae58ca36	Add disaggregated unittest (#4899 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-06-05 19:14:31 +08:00
ixlmar	a1526356aa	[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests (#4859 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-05 10:46:29 +01:00
QI JUN	b8c5e3892b	Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-05 17:43:30 +08:00
QI JUN	d5a8079eb6	Revert "[infra] Unwaive unittests/_torch" (#4950 )	2025-06-05 17:21:07 +08:00
xinhe-nv	1c3091c63b	tests: [TRTQA-2906] add benchmark serving tests (#4901 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-05 14:33:03 +08:00
Yiqing Yan	9ceef983c0	Waive L0 tests (#4927 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 11:09:01 +08:00
xinhe-nv	50a74a1daa	tests: fix 5273697 (#4685 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-05 10:39:21 +08:00
Mike Iovine	8433091630	[infra] Unwaive unittests/_torch (#4919 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-05 08:49:37 +08:00
Lucas Liebenwein	f9d45e03a4	[AutoDeploy] deprecate CI post-merge tests and keep them for local testing (#4892 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-06-05 08:27:17 +08:00
Yi Zhang	1fca654bfd	tests: Update gb200 test case (#4754 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-04 18:49:20 +08:00
Yan Chunwei	ac20159d32	fix: build_config in TorchLlmArgs and avoid invalid args (#4600 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-04 13:17:29 +08:00
Shi Xiaowei	b13f8c9cba	Fix: NVBug 5302895 (#4835 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-06-04 09:31:39 +08:00
Nikita Korobov	8043d7a03c	feat: update DeepSeek FP8 TRT-LLM Gen cubins (#4643 ) Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>	2025-06-03 14:07:54 -07:00
rakib-hasan	d0eb47d33a	[TRTLLM-5053] Refactoring and Unifying the Multimodal input preparation (#4506 ) * refactoring the multimodal input prep Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * adding out-of-tree override option Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * adding exceptional case for llava-next Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * fixing typo Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * addressing review comments, adding placement option, handling tokenizer variations Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * addressing pytest-asyncio behavior change Signed-off-by: Rakib Hasan <rhasan@nvidia.com> --------- Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-06-03 12:02:07 -07:00
Simeng Liu	2384655c3a	chore: Waive examples/test_mistral.py::test_llm_mistral_v1_1gpu. (#4873 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-03 14:45:14 -04:00
Shunkangz	ae9a6cf24f	feat: Add integration of etcd (#3738 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com> Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>	2025-06-03 20:01:44 +08:00
Robin Kobus	b9263a8e10	fix: max_num_sequences calculation with overlap scheduling (#4532 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-03 09:31:22 +02:00
Iman Tabrizian	141467d4b6	Add pre-merge Triton backend tests (#4842 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-03 00:47:58 -04:00
ruodil	fa93eeee84	shorten reqs in con:1 cases and add streaming cases, and add l2 perf … (#4849 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:28:13 +08:00
Ivy Zhang	8686868531	tests: [TRTQA-2905] improve timeout report for qa test cases (#4753 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:27:27 +08:00
Robin Kobus	e34a1beb72	[nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding (#4735 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-03 10:40:43 +08:00
Fanrong Li	380a5d1690	[https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue (#4536 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 10:03:34 +08:00
Fanrong Li	13f68338d2	fix: [https://nvbugspro.nvidia.com/bug/5273945 ] Unwaive tests for bug-5273945 (#4832 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 22:01:57 +08:00
Yanchao Lu	8166649d03	[Infra] - Minor clean-up and test Ubuntu mirrors (#4829 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-02 20:18:20 +08:00
Fanrong Li	7d356efc7d	fix: fix accuracy and illegal memory access issues when using mtp + attention dp (#4379 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 00:35:52 +08:00
amirkl94	8039ef45d3	CI: Performance regression tests update (#3531 )	2025-06-01 09:47:55 +03:00
Emma Qiao	202813f054	Check test names in waive list (#4292 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-01 14:39:30 +08:00
Dom Brown	338d6e9f95	[nvbug 5305210] fix: Resolve nvbug 5305210 (#4759 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-31 19:21:06 +08:00
Yan Chunwei	93c0632ee4	opt: the perormance for dist-agg streaming generation (#4214 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-31 17:40:32 +08:00
Emma Qiao	c945e92fdb	[Infra]Remove some old keyword (#4552 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-31 13:50:45 +08:00
Zheng Duan	54200ee8ac	fix: random fail of cache router test (#4597 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-30 16:28:19 +08:00
xinhe-nv	53794b26f8	test: skip test_llm_hf_gemma_quantization_1gpu_vswa on A100 (#4779 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-30 15:12:12 +08:00
Yilin Fan	31bb650298	Cherry pick feat/llama4 to main (#4739 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com> Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-05-30 05:28:40 +08:00
Jhao-Ting Chen	fcadce9f8d	[fix] Eagle-2 LLMAPI pybind argument fix. (#3967 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-29 12:23:25 -07:00
yuanjingx87	2c48ff5898	[feat] add b200 support via slurm (#4709 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-29 14:49:46 +08:00
Yan Chunwei	33a9ba55f5	fix: test trtllm-bench mgmn (#4613 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-29 14:43:47 +08:00
ruodil	500aca4f44	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 13:58:47 +08:00
QI JUN	058f83e47b	CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-29 11:15:56 +08:00
Aurelien Chartier	6cf1e4d0a9	chore: add -f to pkill calls (#4711 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-05-29 02:54:31 +08:00
Ivy Zhang	ed3c67e34a	tests: [https://nvbugspro.nvidia.com/bug/5289908 ] run maverick bf16 on blackwell (#4722 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-28 22:05:51 +08:00

1 2 3 4 5 ...

384 Commits