TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 20:23:08 +08:00

Author	SHA1	Message	Date
xinhe-nv	b8b2bd4a0a	[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 17:17:27 +08:00
Linda	898f37faa0	[None][feat] Enable nanobind as the default binding library (#6608 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-08-22 09:48:41 +02:00
xinhe-nv	4017f7cd6b	[None][chore] Add failed cases into waives.txt (#7109 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 10:39:25 +08:00
dominicshanshan	6f245ec78b	[None][chore] Mass integration of release/1.0 (#6864 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-22 09:25:15 +08:00
Emma Qiao	344bc4575d	[None][infra] Waive failed case for main branch (#7129 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-22 00:08:55 +08:00
Dimitrios Bariamis	f49dafe0da	[https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-21 18:08:38 +02:00
bhsueh_NV	ba0a86e0bb	[https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci (#7000 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 01:17:32 -04:00
xinhe-nv	21f4434404	[None][chore] waive failed cases on H100 (#7084 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-21 11:15:23 +08:00
Yechan Kim	0893afae3d	[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-21 08:54:12 +08:00
bhsueh_NV	73d2daa386	[https://nvbugs/5457489 ][fix] unwaive some tests (#6991 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 08:49:57 +08:00
QI JUN	a918de710a	[None][ci] move some tests of b200 to post merge (#7093 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-20 19:43:40 -04:00
Emma Qiao	f84dd64250	[None][infra] Waive failed tests on main branch 8/20 (#7092 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-20 06:33:44 -04:00
Robin Kobus	b95cab2a7c	[None][ci] move unittests to sub-directories (#6635 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-20 05:42:22 -04:00
xinhe-nv	9e71b4fda4	[TRTLLM-7205][feat] add llama4 tp4 tests (#6989 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-20 13:22:05 +08:00
Leslie Fang	3f6a9267f1	[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-20 13:14:34 +08:00
Bo Deng	30da5d3cc4	[None][chore] unwaive test_disaggregated_genbs1 (#6944 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 09:57:35 +08:00
Emma Qiao	8f95f35503	[None][infra] Waive failed tests on main (#7037 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 09:31:07 -04:00
Yiqing Yan	07506bccbe	[None][chore] Remove duplicate test waives (#7044 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-19 21:04:31 +08:00
Fanrong Li	655d0f48d0	[https://nvbugs/5455140 ][fix] unwaive DSR1-fp4 throughput_tp8 (#7022 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 20:48:05 +08:00
xinhe-nv	2c86cee38c	[None][chore] Remove closed bugs (#6969 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-19 16:01:33 +08:00
Ivy Zhang	bff5fdf6df	[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 13:59:14 +08:00
William Zhang	daa2a65d37	[https://nvbugs/5454875 ][ci] Unwaive Mistral Small 3.1 test (#7011 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-19 00:32:14 -04:00
fredricz-20070104	e90280a84d	[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-19 00:13:04 -04:00
Fanrong Li	816a120af6	[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 00:03:03 -04:00
Lizhi Zhou	71e28eab36	[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-19 09:58:22 +08:00
Leslie Fang	e76e5c640f	[None][infra] Enable accuracy test for mtp and chunked prefill (#6314 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-19 07:42:52 +08:00
Yiqing Yan	1ce23545fc	[None][chore] Remove duplicate test waives (#6998 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 21:15:49 +08:00
Emma Qiao	69ff32f9b1	[None][infra] Waive failed tests on main 0818 (#6992 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:34:52 +08:00
Shi Xiaowei	5ec15b98f0	[TRTLLM-7030][fix] uppercase def value in pd-config (#6981 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-18 02:33:23 -04:00
Leslie Fang	ce0b13ea02	[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-18 09:18:17 +08:00
Emma Qiao	cc6d763824	[None][infra]Waive failed cases in main branch (#6951 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-17 14:27:59 +03:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
brb-nv	9505727d31	[https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests (#6952 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-15 16:35:02 -07:00
yifeizhang-c	4127d77678	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-15 09:52:06 -07:00
liji-nv	18ccd053d3	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6858 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-15 11:14:20 -04:00
xinhe-nv	b23fdfc62f	[None][chore] Add failed cases into waives.txt (#6914 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-15 14:00:16 +08:00
Yanchao Lu	3a987891d8	[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-15 11:16:07 +08:00
Bo Li	26f413ad90	[https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case (#6882 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-14 17:46:54 -04:00
Emma Qiao	96339c69a9	[None][infra] Waive failed cases on main (#6902 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA	ffc976ceaf	[https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-14 22:36:56 +08:00
NVJiangShao	a700646132	[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-08-14 13:35:37 +08:00
Bo Deng	d8acca495b	[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-14 04:36:38 +00:00
jmydurant	4200fa46d1	[None][feat] Add support for Hopper MLA chunked prefill (#6655 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-14 10:39:26 +08:00
Mike Iovine	7cba883932	[https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test (#6833 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-13 17:38:45 -04:00
Emma Qiao	c7e6145409	[None][infra] Waive failed cases on main (#6863 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-13 09:50:14 -04:00
Anthony Chang	2198587b35	[https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend (#6200 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-08-13 21:24:40 +08:00
Yechan Kim	12102e2d48	[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-12 19:34:02 -07:00
Chang Liu	be9dd4713c	[https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-11 17:16:49 -07:00
Emma Qiao	5145e9d40e	[None][infra] Unwaive an updated case to test (#6791 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 06:47:33 -04:00
Emma Qiao	d6ad4a9d5b	[None][infra] Waive failed tests on main 0811 (#6778 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 03:16:25 -04:00
xinhe-nv	9c358c26e4	[None][chore] remove closed bugs (#6772 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-11 14:39:58 +08:00
Eran Geva	b3e8fa2960	[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu (#6487 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-08-11 08:33:13 +03:00
Tracin	49bcaa4e95	Add gpt-oss GSM8K test. (#6732 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-08-10 22:45:43 -04:00
Chuang Zhu	c566a8d2a2	[None][fix] fix same pp disagg (#6730 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-08-10 22:45:15 -04:00
Bo Deng	767879ef85	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-11 10:05:10 +08:00
Emma Qiao	ee19ca5e58	[None][infra] Waive test main 0808 (#6751 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-09 23:54:07 -04:00
Ye Zhang	bcf5ec0c9a	[None][feat] Core Metrics Implementation (#5785 ) Signed-off-by: Ye Zhang <zhysishu@gmail.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-09 02:48:53 -04:00
ruodil	b15d6fb145	[None][test] fix yml condition error under qa folder (#6734 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-08 15:59:01 +10:00
2ez4bz	064eb7a70f	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-08 01:50:36 -04:00
Enwei Zhu	aee828d98a	[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-08 12:10:36 +08:00
ruodil	22f45a0e19	[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 22:57:04 -04:00
xinhe-nv	88ced50ca7	[TRTQA-2920][fix] Add failed cases into waives.txt (#6719 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-08 12:54:13 +10:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
Raayan Dhar	4055b764db	[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>	2025-08-07 11:18:02 -04:00
pcastonguay	453a06e6ab	[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-07 14:17:07 +02:00
Enwei Zhu	1b9781e8e7	[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-07 05:53:48 -04:00
xinhe-nv	0a467b00cc	[https://nvbugs/5409414 ][fix] fix Not registered specs (#6660 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 17:55:53 +10:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
ruodil	6c1f7d8b91	[None][test] correct test-db context for perf yaml file (#6686 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 02:47:10 -04:00
YueWeng	157ea77549	[https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-08-07 10:25:17 +08:00
ruodil	780d7507f9	[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:02:13 +10:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
Iman Tabrizian	13ecb4aced	[https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests (#6644 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-06 09:08:29 -04:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Emma Qiao	78a75c2990	[None][Infra] - Split gb200 stages for each test (#6594 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-05 07:10:00 -04:00
xinhe-nv	c32584125e	[TRTQA-2920][fix] Add failed cases into waives.txt (#6600 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00
Ivy Zhang	08ed9d7305	[None][doc] add introduction doc on qa test (#6535 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 17:02:17 +08:00
Ivy Zhang	d101a6cebc	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 16:39:25 +08:00
Haohang Huang	c9eebcb454	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>	2025-08-05 07:47:41 +00:00
ruodil	7625845365	test: add README_release_test.md for perf test (#6443 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-05 02:07:42 -04:00
xinhe-nv	a178cea324	[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 12:47:53 +10:00
xinhe-nv	fe3d607c4b	[TRTQA-2920][fix] Add failed cases into waives.txt (#6581 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-05 12:41:23 +10:00
Ivy Zhang	f3651adea8	[None][test] update invalid test name (#6596 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 08:01:05 -04:00
Emma Qiao	5d8a5a0cb8	[None][Infra]Waive failed case in post-merge on main (#6602 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-04 19:39:44 +08:00
brb-nv	87e4e9f468	[None][chore] Add unit test for Gemma3 lora (#6560 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 04:56:57 -04:00
Pengyun Lin	a15e33351d	[None][fix] Revert commit `48ddc3d` & add test for disagg server with different max_num_tokens (#6259 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-04 15:09:51 +08:00
xinhe-nv	a54972e463	[None][fix] remove closed bugs (#6576 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:52:11 +10:00
Leslie Fang	a60190836c	[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-04 01:45:24 -04:00
ruodil	6459725bf9	test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:22:39 +10:00
Ivy Zhang	5eefdf2c75	tests: Add llama4 functional cases (#6392 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 11:19:58 +08:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
Ivy Zhang	7547a7d0a2	[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-03 22:11:26 -04:00
Jhao-Ting Chen	4da5cfc511	[None][infra] add eagle3 one model accuracy tests (#6264 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-02 16:07:46 -07:00
Lizhi Zhou	6f34f3489b	[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-01 13:33:34 -04:00
xinhe-nv	263c6c0ad0	test: skip post blackwell (#6357 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-01 13:10:14 -04:00
Emma Qiao	16febefee0	[None][Infra] - Skip failed tests in post-merge (#6558 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-01 22:21:23 +08:00
brb-nv	7447d6ed85	[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-01 09:19:54 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
Yukun He	90856bf97d	[https://nvbugs/5419069 ][fix] Fix the mismatched layer name components. (#6417 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-01 16:32:39 +08:00
brb-nv	2eca0d5925	fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-31 17:18:23 -07:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
Faraz	8e84df74b5	Fix e2e test failure for RTX6000 Pro (#6420 ) Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>	2025-07-30 23:32:44 -04:00
xinhe-nv	ca534e4798	test: add accuracy reference (#6479 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-31 12:27:29 +10:00
bhsueh_NV	ae3a5fc918	[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-31 09:37:23 +08:00
brb-nv	0e16d1f070	test: Add time logging for lora tests (#6466 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 14:02:43 -07:00
Anurag Mukkara	fac186e3b5	[nvbug/5409417] Unwaive llava test case (#6460 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-07-30 14:38:47 -04:00
brb-nv	f6287e4498	Unwaive Gemma2 LoRA test on H100 (#6461 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 12:56:12 -04:00
Bo Deng	24e7f4eece	[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-31 00:41:37 +08:00
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
pcastonguay	0f083b9daf	fix: Unwaive triton cpp test [nvbug 5401088] (#6412 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-30 11:25:18 -04:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
tomeras91	a2514d93fc	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-30 07:22:32 -04:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
xinhe-nv	c00d6763b2	test: [CI] Add failed cases into waives.txt (#6457 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-30 12:36:58 +10:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
xinhe-nv	f1086e7d4f	test: [CI] remove closed bugs (#6381 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-29 19:01:23 +10:00
xinhe-nv	4fbb344caf	test: [CI] Add failed cases into waives.txt (#6423 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-29 19:00:30 +10:00
Yukun He	0eee2e2850	[5385981] fix: Update the usage of VisionAttention init API. (#6413 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-29 16:41:48 +08:00
ruodil	e11255e9d0	test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-29 15:52:45 +10:00
Michal Guzek	2573bb729d	feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-28 14:02:14 -07:00
2ez4bz	cdca541148	[test] Unwaive mistral3.1 small E2E test (#6352 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 14:37:42 -04:00
2ez4bz	60e4d3a9d4	[test] Add accuracy regression test for Mistral3.1 (#6322 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 09:41:44 -07:00
ruodil	03632a679f	test: organize perf cases and add missing perflab cases in qa test list (#6283 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-28 20:33:32 +10:00
xinhe-nv	971be1fe86	test: waive failed cases (#6394 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-28 20:31:43 +10:00
Emma Qiao	b3ca159787	[Infa] - waive failed cases and fix a typo (#6384 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-28 02:06:57 -04:00
Chang Liu	dc757799e1	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
Yan Chunwei	908f49a4ad	[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 09:01:10 +08:00
nv-guomingz	b8d4cb8beb	feat: Support JSON Schema in OpenAI-Compatible API (#6321 ) Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>	2025-07-25 12:55:56 -04:00
xiaoqi	a0aecf0476	[feat]: support logit_bias (#5354 ) Signed-off-by: xq25478 <xq25478@qq.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-25 09:37:41 +00:00
xinhe-nv	470544cf17	test: [CI] Add failed cases into waives.txt (#6333 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-25 17:18:06 +10:00
xinhe-nv	6268a60ab3	tests: add test_chunked_prefill for llama4 (#5549 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 23:02:00 -04:00
bhsueh_NV	7b6aadc800	[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-24 21:47:37 +08:00
Emma Qiao	0cc1f8c03d	[Infra] - Wiave failed tests in post-merge (#6331 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 21:18:06 +08:00
Iman Tabrizian	5fceaa6153	Revert "tests: add timeout_manager to tensorrt flow test cases (#5942 )" (#6309 )	2025-07-23 23:58:10 -04:00
Iman Tabrizian	7740bfa31d	Waive tests (#6312 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 18:15:07 -07:00
Emma Qiao	cb737a5fcd	[Infra] - Skip failed cases (#6299 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-23 21:26:31 +08:00
xinhe-nv	2b0fa24175	test: [CI] Add failed cases into waives.txt (#6289 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-23 19:04:21 +10:00
YueWeng	ed62a06eef	[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue (#6136 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-07-23 14:53:37 +08:00
Iman Tabrizian	bc2fb29c5e	[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 05:27:16 +08:00
John Calderon	b7c8a672da	[Issue 6193] Fix gemma3vl weight loader (#6233 ) Signed-off-by: John Calderon <johncalesp@gmail.com>	2025-07-22 10:32:18 -07:00
Stanley Sun	04f2d4b2eb	test: update test list for RTX6KD (#6213 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-22 18:55:24 +08:00
Yi Zhang	eb7d0f84b5	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yan Chunwei	f194b65f3e	fix [nvbug/5351244]: address remote mpi session submit (#5664 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Ivy Zhang	eb5cb5b642	tests: add timeout_manager to tensorrt flow test cases (#5942 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-22 10:23:41 +08:00
Simeng Liu	4a0951f85c	[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-07-21 15:46:37 -07:00
Yi Zhang	f9b0a911fb	test: Enable GB200 torch compile multi gpu tests (#6145 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-21 22:17:13 +08:00
Emma Qiao	e41507a253	[Infra] - Waive failed cases on recent post-merge (#6212 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-21 21:00:18 +08:00
Linda	3efad2e58c	feat: nanobind bindings (#6185 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-21 08:56:57 +01:00
xinhe-nv	b46fd41026	test: [CI] remove closed bugs (#6201 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-21 15:40:30 +08:00
ruodil	6a3c9f8061	test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-21 11:29:19 +10:00
bhsueh_NV	2e14c8f443	[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-20 10:25:25 +08:00
Ziyi Xiong	66030ef815	[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133 ) Signed-off-by: ziyixiong-nv <fxiong@nvidia.com> Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-19 13:17:15 +08:00
wili	82d3587bb8	[refactor] Unify name of NGram speculative decoding (#5937 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-19 12:59:57 +08:00
xiaoqi	28858c8711	feat(eagle3):support qwen3 dense model (#5879 ) Signed-off-by: xq25478 <xq25478@qq.com>	2025-07-19 01:24:32 +08:00
Bo Deng	2c6fa145ee	[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-19 00:48:44 +08:00
Emma Qiao	77acb4f753	[Infra] - Waive failed tests in post-merge (#6176 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-18 17:34:34 +08:00
Zhenhuan Chen	992b273045	[https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-18 10:34:37 +08:00
Iman Tabrizian	b75e53ab69	Revert "feat: nanobind bindings (#5961 )" (#6160 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-18 10:12:54 +08:00
2ez4bz	8480c120b1	[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-17 11:04:17 -07:00
Linda	5bff317abf	feat: nanobind bindings (#5961 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-17 22:42:52 +08:00
Yi Zhang	a718486900	fix: Fix DeepSeek R1 CI (#6129 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-17 18:24:49 +08:00
Chuang Zhu	44c70c88f9	chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-07-17 17:42:07 +08:00
Iman Tabrizian	d4d21a106e	[fix] Release slots with spec decode + disagg (#5975 ) (#6032 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-17 12:58:18 +08:00
chenfeiz0326	fe070a0168	test: Update Llama4 Scout FP4 & FP8 accuracy tests (#5901 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-17 09:41:18 +08:00
Wanli Jiang	2d2b8bae32	feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-17 06:30:58 +08:00
qixiang-99	e09e409dfb	Fix: Enhance ModelConfig for kv cache size calculations (#5868 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-07-16 14:41:31 -07:00
Emma Qiao	e30d7bec38	[Infra] - Waive failed cases in post-merge on main (#6096 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-16 22:41:18 +08:00
Ivy Zhang	dda91b5117	tests: add QA test cases (#5959 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:14:25 +08:00
Ivy Zhang	763012a88a	[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill (#6051 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:04:08 +08:00
peaceh-nv	f5f31beee1	feat: Add deepseek-lite tests for RTX pro 6000 (#5903 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-16 15:51:45 +08:00
Wanli Jiang	8679a058a3	fix: Unable to load phi4-model with tp_size>1 (#5962 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-16 11:39:41 +08:00
brb-nv	9214ac662a	test: Add regression tests for Gemma3 VLM (#6033 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-15 11:37:56 -07:00
Fanrong Li	7a1af1c738	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 (#5989 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-07-16 01:33:12 +09:00
MinaHuai	9ebc3ab9c4	[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision (#5998 ) Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>	2025-07-15 10:01:35 -04:00
ruodil	2a147c4d01	test: add llama_v3.3_70b_cases in perf test (#6035 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-15 17:53:59 +10:00
ixlmar	f225f5cd2e	[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs (#5964 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-15 06:49:42 +08:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
Zhenhuan Chen	30608a5e6d	[https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error (#5865 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-14 17:17:30 +08:00
ruodil	347520494b	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	6d79559f3e	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	2991cf4b80	fix: [https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. (#5606 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Pengyun Lin	6992616c1f	[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	278a1a7df3	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Iman Tabrizian	c8874a7f94	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	e5e87ecf34	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yan Chunwei	9c673e9707	[TRTLLM-6160] chore: add sampling examples for pytorch (#5951 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 15:28:32 +09:00
Yan Chunwei	c30eead09f	[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch (#5956 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 14:09:39 +08:00
Thor Johnsen	041f1fa513	[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora (#5885 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>	2025-07-11 16:20:41 -07:00
xinhe-nv	509363d858	tests: update sanity tests & fix tests (#5906 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-11 19:48:19 +10:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
Yiqing Yan	3aa53ec36c	[None] - Waive L0 tests (#5915 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-10 18:33:17 +08:00
Enwei Zhu	055c4a9fe6	[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-10 16:30:00 +08:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
Bo Li	9d894bc0cb	fix: [https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. (#5842 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-09 10:17:05 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00
Yan Chunwei	e50d95c40d	chore [TRTLLM-6161]: add LLM speculative decoding example (#5706 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-09 07:33:11 +08:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
Chang Liu	08a3dfeb2b	[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814 )	2025-07-08 09:53:11 -07:00
Raayan Dhar	e3268a4221	[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg (#5732 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-07-08 09:39:58 -04:00
xinhe-nv	89bbb230cc	tests: waive failed cases on main (#5781 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-08 19:44:12 +10:00
liji-nv	95978e3044	[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-08 12:42:15 +08:00
Robin Kobus	30a19fcf7c	[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-07 16:30:43 +02:00
xinhe-nv	ded38ebdbd	test: [CI] remove closed bugs (#5770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-07 18:06:07 +10:00
Yanchao Lu	2013034948	[Test] - Waive or fix few known test failures (#5769 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-06 21:14:16 +08:00
Stefan Niebler	d1112aac37	[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-05 01:35:13 +09:00
Chuang Zhu	ffc0b8f5da	Cache transceiver support VSWA (#5505 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-07-05 01:18:42 +09:00
Yiqing Yan	7f3ea058f0	[Infra] - Waive L0 flaky test (#5759 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 19:25:12 +09:00
xinhe-nv	3869b969a6	test: [CI] Add failed cases into waives.txt (#5718 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 17:24:48 +09:00
Faraz	81c0764012	Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5724 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>	2025-07-04 16:53:20 +09:00
Yiqing Yan	b8fef809ae	[Infra] - Waive L0 test (#5748 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 15:04:49 +08:00
Yi Zhang	73d30a23c7	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Zheng Duan	cb9f596dbe	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
xinhe-nv	7f837b6e8b	tests: waive failures on main (#5704 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 12:39:12 +09:00
Venky	4762e0b244	Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example (#5740 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-07-04 11:01:08 +09:00
Netanel Haber	f91379b7e8	delete duplicate eagle3 and ngram tests (#5711 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-07-03 15:47:26 +03:00
Omer Ullman Argov	c72856188c	[ci] small multigpu speedups (#5643 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-03 08:06:10 -04:00
Emma Qiao	530897388c	[Infra] - Waive a failed case on main (#5702 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-03 06:09:27 -04:00
Emma Qiao	2a5fdebf10	[Infra] - Waive failed tests for main 0702 (#5671 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 22:05:07 -04:00
Emma Qiao	31699cbeb1	[Infra] - Set default timeout to 1hr and remove some specific settings (#5667 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 08:37:54 -04:00
Kaiyu Xie	f9a455651b	perf: Use tokenizers API to optimize incremental detokenization perf (#5574 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-01 09:35:25 -04:00
Yan Chunwei	3bc703d450	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
brb-nv	4ef60d5fbb	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Yan Chunwei	a5eff139f1	[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-01 19:06:41 +08:00
Emma Qiao	65c2b93284	[Infra] - Add some timeout and unwaive a test which dev fixed (#5631 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-01 05:01:32 -04:00
Pamela Peng	071ad758c4	[https://nvbugs/5318059 ][test] Unwaive test (#5624 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>	2025-07-01 04:54:44 -04:00
xinhe-nv	19c56f0374	test: [CI] Add failed cases into waives.txt (#5582 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 14:57:03 +08:00
xinhe-nv	a8cf611baa	test: [CI] Add failed cases into waives.txt (#5569 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 11:02:56 +08:00
xinhe-nv	9b17b29b6e	test: [CI] remove closed bugs (#5572 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 10:15:43 +08:00
Omer Ullman Argov	42134b8b84	[ci] move eagle1 and medusa tests to post-merge (#5604 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 19:32:28 +08:00
Fanrong Li	6cbc9a5297	[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-30 15:59:12 +08:00
Yiqing Yan	4fef14da56	Deduplicate waive list (#5546 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-30 11:12:26 +08:00
Talor Abramovich	70e34a3291	[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376 ) Signed-off-by: Talor Abramovich <talora@nvidia.com>	2025-06-29 12:46:30 +00:00
amirkl94	a985c0b7e6	tests: Move stress tests to be Post-Merge only (#5166 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-06-29 09:44:47 +03:00
Iman Tabrizian	26b953e29a	[nvbugs/5309940] Add support for input output token counts (#5445 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-28 04:39:39 +08:00
wili	56cdfe5c6c	[TRTLLM-5000][feat] NGrams V2 (#4569 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-06-27 23:00:17 +08:00
Iman Tabrizian	49af791f66	Add testing for trtllm-llmapi-launch with tritonserver (#5528 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-27 11:19:52 +08:00
xinhe-nv	a3494bebec	tests: waive failed tests on main (#5512 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-27 10:13:22 +08:00
Frank	aa6e015ef8	Update trtllm-bench to support new Pytorch default. (#5491 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-06-26 17:05:43 -07:00
jmydurant	8836990bde	[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 22:18:08 +08:00
Omer Ullman Argov	6bae76d7ca	[fix][ci] move torch tests to run under torch stage (#5473 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 14:31:38 +03:00
Omer Ullman Argov	1633bd2bef	[CI] move flashinfer llama tests to post merge (#5506 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 19:27:32 +08:00
xinhe-nv	ff2dd72df4	tests: waive tests (#5458 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-26 14:53:55 +08:00
Emma Qiao	32d1573c43	[Infra] - Add timeout setting for long tests found in post-merge (#5501 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-26 11:31:39 +08:00
Venky	d9b75f83fd	[CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5494 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-25 20:17:12 -07:00
jmydurant	578dbc8d9a	feat: chunked prefill for MLA (Blackwell) (#4651 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 09:01:00 +08:00
HuiGao-NV	74ae15a26b	CI: enable test cases on single device type (#5484 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-26 08:03:44 +08:00
QI JUN	feaf789342	CI: reduce BF16 test cases in B200 (#5482 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-06-26 07:18:20 +08:00
HuiGao-NV	cc3c2b3be2	Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 21:38:14 +08:00
Kaiyu Xie	d6ada5ffce	[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-25 20:37:24 +08:00
Netanel Haber	3ca2f6ac51	start OAIServer with `max_beam_width=1` for TorchSampler (#5427 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-06-25 15:52:06 +08:00
Enwei Zhu	fc7a81ceb0	test: Add LLGuidance test and refine guided decoding (#5348 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 14:12:56 +08:00
Enwei Zhu	76da7fed86	fix (NvBug 5354925): Fix static EPLB (#5411 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 13:14:40 +08:00
dongxuy04	699520082b	Add MTP support for Online EPLB (#5213 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-06-25 07:58:13 +08:00
Emma Qiao	475272046a	[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-24 17:19:31 +08:00
xinhe-nv	658fb5b54e	tests: update benchmark test lists (#5365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 15:23:38 +08:00
xinhe-nv	4b32a3f1a7	test: [CI] remove closed bugs (#5400 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 13:39:57 +08:00
Fanrong Li	5d4ab47d5b	fix: refactor and fix mtp vanilla (#4762 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-20 05:23:39 +08:00
Kaiyu Xie	7246fd75d1	feat: Support stream_interval (#5284 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-19 21:57:10 +08:00
Enwei Zhu	bca758fce1	fix: Fix DS-R1 nvfp4 test case naming (#5361 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-19 15:50:43 +08:00
Emma Qiao	493f268b1c	[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:05:57 +08:00
ruodil	e22e884b02	test: amend test case name in perf cluster test (#5356 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:50:12 +08:00
ruodil	21ce9b6749	test: add qwen3 cases (#5302 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-19 14:38:36 +08:00
amitz-nv	1753202b61	[TRTLLM-5825][fix] Fix torch LoRA TP (#5338 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-06-19 09:12:00 +03:00
Emma Qiao	7f68de3e3f	Refactor test timeout for individual long case (#4757 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 13:52:11 +08:00
bhsueh_NV	dce8620013	chore: enable moe_backend on Qwen3 test (#5230 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-06-19 13:40:45 +08:00
xinhe-nv	e5400eeae0	tests: add ds r1 tp4 test (#5197 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-19 12:48:33 +08:00
Yiqing Yan	da576bcafa	Waive L0 test (#5349 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:01:11 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
Omer Ullman Argov	5010f8719d	[fix][test] remove duplicate test runs (#5241 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 01:59:54 +08:00
Omer Ullman Argov	a28a152001	[fix][test] remove some cpp test cases from h100 (#5335 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 20:40:26 +03:00
yuanjingx87	a1c5704055	[feat] Multi-node CI testing support via Slurm (#4771 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-19 01:11:12 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
HuiGao-NV	d13d2f460d	Remove duplicated test cases (#5323 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 21:20:20 +08:00
Emma Qiao	b29ac5b561	[Infra] Update 5080 and 5090 case condition due to the driver update (#5317 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-18 20:01:36 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Wanli Jiang	3a02489e86	[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-18 15:12:49 +08:00
ruodil	3b5d916250	test: cherry-pick deepseek rcca cases in main branch (#5307 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-18 14:26:26 +08:00
Yiqing Yan	8f67e3604d	Waive L0 tests (#5308 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-18 12:43:45 +08:00
Omer Ullman Argov	f501ce57b1	[fix][test] move deepseek single gpu tests to post merge (#5280 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 06:59:39 +03:00
Ivy Zhang	41cfcaa964	test: update qa test list (#5305 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-18 11:29:11 +08:00
Emma Qiao	ff32caf4d7	[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 23:48:34 +08:00
Yanchao Lu	f4cdbfcdf0	None - Some clean-ups for the automation pipeline (#5245 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 21:08:24 +08:00
QI JUN	ccd9adbe33	CI: move multi-gpu test cases of tensorrt backend to h200 (#5272 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:37:37 +08:00
Ivy Zhang	2ad8758ecc	[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases (#5073 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 17:14:01 +08:00
QI JUN	517c1ecf72	move some test cases of TensorRT backend back (#5232 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:03:11 +08:00
xinhe-nv	a49ad790b3	test: [CI] remove closed bugs (#5218 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-17 13:13:23 +08:00
QI JUN	546274d40e	fix ci (#5259 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 12:03:09 +08:00
ruodil	bb2348372c	test: add more pytorch cases in perf test (#5237 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-17 11:11:28 +08:00
Simeng Liu	5c18160d27	chore: Waive CI failure. (#5252 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-16 20:47:05 +02:00
Ivy Zhang	64b7f04fdc	[test] split nemotron test cases from examples_test_list (#5238 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-16 16:36:33 +08:00
xinhe-nv	802f22cd12	test: [CI] Add failed cases into waives.txt (#5221 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-16 16:11:53 +08:00
Yiqing Yan	8445416c39	Waive L0 tests (#5233 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-16 15:19:03 +08:00
ruodil	2848e012ae	test: add llama4 models for perf test (#5187 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-16 11:24:35 +08:00
ruodil	3d22f27063	test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-16 11:23:20 +08:00
Enwei Zhu	babdd9ce06	test: Add json_mode_eval for guided decoding evaluation (#5179 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-16 10:03:55 +08:00
amitz-nv	109c426077	Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130 )	2025-06-15 18:54:04 +03:00
Tailing Yuan	0b60da2c45	feat: large-scale EP(part 7: DeepEP integration) (#4792 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-14 19:12:38 +08:00
Enwei Zhu	5f2785fb90	fix: Fix waive list (#5205 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-13 23:33:23 +08:00
QI JUN	952f33dcad	CI: move all test cases of TensorRT backend into post merge (#5186 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-13 20:48:48 +08:00
xinhe-nv	30d9d0fa71	test: [CI] Add failed cases into waives.txt (#5178 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 16:38:51 +08:00
Ivy Zhang	28cd536bd6	[test] Update timeout params in QA test list (#5124 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-13 13:40:03 +08:00
Iman Tabrizian	01bd4c00b4	Add two MTP disaggregated test (#4546 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-13 12:17:45 +08:00
xinhe-nv	d9be419f45	tests: update tests for b200 (#5180 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 11:25:33 +08:00
ruodil	fa582cbe9a	test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-13 11:09:15 +08:00
nv-guomingz	cf35a079f9	fix:https://nvbugs/5298661 (#5022 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-12 20:41:44 +08:00
Shi Xiaowei	88cba5f354	test: waive the NIXL related tests (#5153 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-06-12 17:02:27 +08:00
Fanrong Li	4d070d3862	chore: fix typo in tests (#5092 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-12 15:11:26 +08:00
Michal Guzek	53983ad273	[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 15:06:28 +08:00
ruodil	d021cc5126	test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-12 14:59:16 +08:00
Venky	c3b2eb6dab	test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ (#5066 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-12 14:19:15 +08:00
xinhe-nv	11b94feff8	test: skip disaggregated tests on arm (#5070 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-11 17:00:10 +08:00
ruodil	56abae0835	test: add more llama_v3.3_70b cases in perf test (#4979 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-11 15:44:22 +08:00
Yiqing Yan	0a9f105931	Waive L0 tests (#5111 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-11 11:53:15 +08:00
Zheng Duan	580a92521e	test: conditional disagg and cache aware balancing for deepseek v3 (#4522 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-11 09:44:29 +08:00
liji-nv	f6a49a9343	[CI] waive failing L0 test (#5089 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-10 20:40:44 +08:00
Yiqing Yan	8ec8e4559d	Waive L0 test (#5077 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-10 16:23:49 +08:00
Yiqing Yan	fdfc711261	Waive L0 test (#5067 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-10 15:40:57 +08:00
Stanley Sun	74b0e71ef4	test: add more disaggregated serving tests into QA testlist (#5036 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-10 09:24:53 +08:00
pcastonguay	5b84fd9201	[nvbug 5283506] fix: Fix spec decode triton test (#4845 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-06-09 08:40:17 -04:00
Yukun He	137fe35539	fix: Fix warmup phase batch size out of range. (#4986 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-09 19:19:16 +08:00
Yuxian Qiu	88480197da	ci: [nvbugs/5280806] Unwaive unittests/_torch. (#4951 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-09 19:04:11 +08:00
liji-nv	1d4f748773	[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-09 17:50:57 +08:00
Yiqing Yan	6b17dff2f1	Waive L0 test (#5024 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-09 16:03:15 +08:00
Yan Chunwei	f4bfb8e49d	ci: unwaive llmapi launch test (#4991 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-09 13:25:43 +08:00
Omer Ullman Argov	8731f5f14f	chore: Mass integration of release/0.20 (#4898 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>	2025-06-08 23:26:26 +08:00
Mike Iovine	ec0d984656	[nvbug/5280806][fix] Fix 2 model spec decode flow (#4807 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-08 07:40:02 -04:00
Yanchao Lu	9e05613679	[Infra] - Update JNLP container config (#5008 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-08 16:44:09 +08:00
QI JUN	5ee0de7f2a	Resubmit #4894 (#4969 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-08 04:42:15 +08:00
Ivy Zhang	7dce328ad6	[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com>	2025-06-07 11:18:32 +08:00
Fanrong Li	75d020cf07	fix: fix cuda graph padding for spec decoding (#4853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-06 22:21:42 +08:00
Anthony Chang	eeb555e37b	chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-06-06 16:13:54 +08:00
xinhe-nv	564472168e	test: [CI] Add failed cases into waives.txt (#4966 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-06 10:30:15 +08:00
QI JUN	ec50684d80	Revert "fix a bug of global cuda graph dummy request" (#4970 )	2025-06-06 08:54:45 +08:00
QI JUN	154f7cc40a	fix a bug of global cuda graph dummy request (#4894 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-05 19:47:40 +08:00
Yiqing Yan	7e921c78b5	Waive L0 tests (#4953 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 19:36:48 +08:00
Shunkangz	3eae58ca36	Add disaggregated unittest (#4899 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-06-05 19:14:31 +08:00
QI JUN	d5a8079eb6	Revert "[infra] Unwaive unittests/_torch" (#4950 )	2025-06-05 17:21:07 +08:00
xinhe-nv	1c3091c63b	tests: [TRTQA-2906] add benchmark serving tests (#4901 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-05 14:33:03 +08:00
Yiqing Yan	9ceef983c0	Waive L0 tests (#4927 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 11:09:01 +08:00
xinhe-nv	50a74a1daa	tests: fix 5273697 (#4685 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-05 10:39:21 +08:00
Mike Iovine	8433091630	[infra] Unwaive unittests/_torch (#4919 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-05 08:49:37 +08:00
Lucas Liebenwein	f9d45e03a4	[AutoDeploy] deprecate CI post-merge tests and keep them for local testing (#4892 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-06-05 08:27:17 +08:00
Yi Zhang	1fca654bfd	tests: Update gb200 test case (#4754 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-04 18:49:20 +08:00
Shi Xiaowei	b13f8c9cba	Fix: NVBug 5302895 (#4835 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-06-04 09:31:39 +08:00
Simeng Liu	2384655c3a	chore: Waive examples/test_mistral.py::test_llm_mistral_v1_1gpu. (#4873 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-03 14:45:14 -04:00
Iman Tabrizian	141467d4b6	Add pre-merge Triton backend tests (#4842 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-03 00:47:58 -04:00
ruodil	fa93eeee84	shorten reqs in con:1 cases and add streaming cases, and add l2 perf … (#4849 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:28:13 +08:00
Ivy Zhang	8686868531	tests: [TRTQA-2905] improve timeout report for qa test cases (#4753 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:27:27 +08:00
Robin Kobus	e34a1beb72	[nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding (#4735 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-03 10:40:43 +08:00
Fanrong Li	380a5d1690	[https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue (#4536 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 10:03:34 +08:00
Fanrong Li	13f68338d2	fix: [https://nvbugspro.nvidia.com/bug/5273945 ] Unwaive tests for bug-5273945 (#4832 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 22:01:57 +08:00
Yanchao Lu	8166649d03	[Infra] - Minor clean-up and test Ubuntu mirrors (#4829 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-02 20:18:20 +08:00
Fanrong Li	7d356efc7d	fix: fix accuracy and illegal memory access issues when using mtp + attention dp (#4379 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 00:35:52 +08:00
amirkl94	8039ef45d3	CI: Performance regression tests update (#3531 )	2025-06-01 09:47:55 +03:00
Emma Qiao	202813f054	Check test names in waive list (#4292 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-01 14:39:30 +08:00
Dom Brown	338d6e9f95	[nvbug 5305210] fix: Resolve nvbug 5305210 (#4759 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-31 19:21:06 +08:00
Emma Qiao	c945e92fdb	[Infra]Remove some old keyword (#4552 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-31 13:50:45 +08:00
Jhao-Ting Chen	fcadce9f8d	[fix] Eagle-2 LLMAPI pybind argument fix. (#3967 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-29 12:23:25 -07:00
yuanjingx87	2c48ff5898	[feat] add b200 support via slurm (#4709 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-29 14:49:46 +08:00
Yan Chunwei	33a9ba55f5	fix: test trtllm-bench mgmn (#4613 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-29 14:43:47 +08:00
ruodil	500aca4f44	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 13:58:47 +08:00
QI JUN	058f83e47b	CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-29 11:15:56 +08:00
xinhe-nv	93283484c2	test: [CI] Add failed cases into waives.txt (#4688 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-28 22:04:35 +08:00
amirkl94	fbec0c3552	Release 0.20 to main (#4577 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: stnie <82932102+stnie@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-05-28 16:25:33 +08:00
xinhe-nv	bb3d998eb1	test: [CI] remove closed bugs (#4638 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-27 18:07:59 +08:00
Yiqing Yan	92a7984945	Waive L0 tests (#4686 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-27 15:07:02 +08:00
xinhe-nv	59f7622281	test: rcca https://nvbugs/5223130 (#4510 ) * add rcca tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip tests on blackwell Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-27 09:59:47 +08:00
yuanjingx87	732d92ff62	[Infra] - Multi-GPU testing support with Slurm (#4454 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-26 19:44:19 +08:00
Enwei Zhu	88190faa34	feat: large-scale EP(part 4: Static EP load balancer integration) (#4615 ) * MoeLoadBalancerConfig Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * MoeLoadBalancer integration Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * config file Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-26 18:25:11 +08:00
Yiqing Yan	2fee408536	Waive L0 tests (#4645 ) * Waive L0 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-26 11:05:01 +08:00
Yanchao Lu	20c15fc04f	Fix invalid testcase name (#4626 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-24 00:40:00 +08:00
Anthony Chang	bbea2647b1	Qwen3 supports TRTLLM FP4 MoE backend (#4530 ) * MoE TRTLLM backend for Qwen3 Signed-off-by: Anthony Chang <anchengc@nvidia.com> * add extra moe_backend to test Signed-off-by: Anthony Chang <anchengc@nvidia.com> * address comments Signed-off-by: Anthony Chang <anchengc@nvidia.com> * conditionally compile kernels on newer archs Signed-off-by: Anthony Chang <anchengc@nvidia.com> * missing positional arg Signed-off-by: Anthony Chang <anchengc@nvidia.com> * Update the routing kernels Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Revise usage of TLLM_LOG_ERROR Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Add unit test for Qwen3 moe (trtllm_gen backend) Signed-off-by: Christina Zhang <christinaz@nvidia.com> * improve weight processing speed of moe_backend=TRTLLM; roughly 2x Signed-off-by: Anthony Chang <anchengc@nvidia.com> * tidy and minor fix Signed-off-by: Anthony Chang <anchengc@nvidia.com> * temporarily disable accuracy test that has known issue Signed-off-by: Anthony Chang <anchengc@nvidia.com> --------- Signed-off-by: Anthony Chang <anchengc@nvidia.com> Signed-off-by: Christina Zhang <christinaz@nvidia.com> Co-authored-by: Christina Zhang <christinaz@nvidia.com>	2025-05-23 18:31:08 +08:00
Enwei Zhu	d7443b6068	[https://nvbugspro.nvidia.com/bug/5181262 ] [test] Unwaive Mistral Nemo test (#4515 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-23 10:14:00 +08:00
pcastonguay	d7d455e7ea	[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243 ) * feat: Enabling dis serving with TRT backend with Python runtime Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing formatting Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg mtp test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-22 22:01:06 -04:00
Mike Iovine	14fc48ada7	[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402 ) [fix] Fix chunked prefill + overlap scheduler Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-23 04:38:22 +08:00
Venky	c713eb5799	test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) (#4446 ) ultra Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-22 13:07:33 -07:00
xinhe-nv	22c01d5b21	test: [CI] Add failed cases into waives.txt (#4549 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix test issues Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-22 17:18:53 +08:00
ruodil	1a45890dae	test: waive hanging cases for perf test (#4562 ) waive hanging cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-22 15:50:05 +08:00
HuiGao-NV	bc9f1dbede	fix[nvbug-5228840]: Remove test cases of feature not supported anymore (#3972 ) * Remove waived cases * Remove test cases of not supported feature Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-22 11:18:58 +08:00
Michal Guzek	9033dd987d	[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct (#4415 ) Add phi-4-mini CLI acc test Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-22 09:56:48 +08:00
Chuang Zhu	44cfd757b2	Agent interface impl for NIXL (#4125 ) * agentConnection Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> recv Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> agentState Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> NIXL interfaces Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> update cmakelists Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> nixl improve Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> remove cppzmq Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> transferAgent remove register Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> work for cache Test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> reduce sleep time Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> intergarte Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> nixl env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix rebase error Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> cpp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> stash for send metaData Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> loadRemoteMD after fetchRemoteMD Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> workaround for mixed gen and context Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> test_env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> avoid port conflict in test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * format Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use std::string Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * typo Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * fix transferAgentTest Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-22 09:09:41 +08:00
Dom Brown	1cffa99792	test: Split test_simple into mpi_utils and cache transceiver tests for DGX (#4451 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-22 04:26:21 +08:00
Venky	0a8461d54c	test(perf): Pt.2 Add `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (cpp) (#4499 ) add low concurrency perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-21 10:46:48 -07:00
xinhe-nv	407ef08662	tests: add qwene fp4 tests into QA test list & update sanity test list (#4478 ) * update sanity test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-21 16:52:02 +08:00
ruodil	83f1933f0c	test: add failed case in waive list and fix some test script issue for perf test (#4527 ) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-21 16:37:25 +08:00
QI JUN	15317ece5a	CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite (#4520 ) waive test_fp8_block_scales_4gpus of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-21 13:19:43 +08:00
xinhe-nv	750f412b8f	tests: add llama 3.3 70b 2 nodes tests (#4391 ) * add llama 3.3 70b 2 nodes tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * remove enable_overlap_scheduler parameter Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-21 12:42:45 +08:00
Chuang Zhu	ab5bea957d	unwaive some disagg test (#4476 ) * unwaive some disagg test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * pytest.mark.skip_less_device(4) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-21 11:45:11 +08:00
Yan Chunwei	9199793848	fix: llmapi-launch add add trtllm-bench test with engine building (#4091 ) * add trtllm-bench mgmn test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-21 10:18:01 +08:00

... 6 7 8 9 10 ...

980 Commits