TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-18 16:55:08 +08:00

Author	SHA1	Message	Date
Zongfei Jing	3c5f97bf57	fix(custom_ops): update candidates for MMA tiling and cluster shapes Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:13:50 -08:00
Zongfei Jing	45b468c66e	fix(custom_ops): refactor kernel handling - Increased tune_max_num_tokens from 2 to 256 for improved performance. - Refactored kernel handling by creating pointers for kernel arguments and caching compiled kernels to optimize execution. - Adjusted tensor handling in unit tests to ensure compatibility with the updated kernel interface. Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:13:49 -08:00
Zongfei Jing	b6b7aa3592	fix(moe): reshape fc1_output_sf for compatibility in DenseGEMMFusedMoE - Adjusted the shape of fc1_output_sf to ensure compatibility with the dense GEMM operation. - This change enhances the integration of the FC2 kernel by ensuring proper tensor dimensions. Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:13:49 -08:00
Zongfei Jing	a0f523c628	refactor(moe): improve FC1/FC2 get_valid_tactics and tuning config - Refactor get_valid_tactics for FC1 and FC2 runners to define mma_tiler_mn_candidates and cluster_shape_mn_candidates together - Use itertools.product for cleaner iteration pattern - Update get_tuning_config for FC1 and FC2 to use DynamicTensorSpec and ConstraintSpec for proper dynamic tensor handling - FC1: Add constraint for input scale factor shape inference - FC2: Add constraints for input scale factor and alpha_scale shape Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:13:46 -08:00
Zongfei Jing	672df6a422	Rename test file to test_moe_densegemm.py Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:19 -08:00
Zongfei Jing	41f81093e2	Add FC2 kernel integration and unit tests for MoE dense GEMM - Add wrapper method to FC2 kernel for pointer-based API - Add CuteDSLNVFP4DenseGemmFC2Runner in custom_ops - Register trtllm::cute_dsl_nvfp4_dense_gemm_fc2_blackwell custom op - FC2 supports per-token-per-expert alpha_scale (m, expert_count) - Add nvfp4_dense_gemm_fc2_ref reference implementation - Add test_nvfp4_dense_gemm_fc2_blackwell parametrized test - Fix FC2 kernel k_tile_cnt to use Int32 for cutlass.range Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:18 -08:00
Zongfei Jing	35631a37ad	Optimize gen_fc2_alpha with fused kernel Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:18 -08:00
Zongfei Jing	1df9e8a0fc	Add missing __init__.py to moe_as_dense_gemm package Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:17 -08:00
Zongfei Jing	49d887f521	Fix dense GEMM integration and add scale factor validation - Fix c_sf shape calculation: use pad_up(m, 128) // 128 for non-128-aligned m - Change c_sf dtype to uint8 to match fp4_utils.py SF_DTYPE - Add scale factor shape and value validation in unit test - Fix test to handle padded scale factors correctly Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:17 -08:00
Zongfei Jing	84dbc447f4	Add NVFP4 dense GEMM with SwiGLU fusion integration and unit tests - Add CuteDSLNVFP4DenseGemmSwigluRunner to custom_ops for FC1 kernel - Support FP4 output dtype in dense GEMM kernel - Add unit test for dense GEMM with SwiGLU fusion - Fix per-expert SwiGLU application in reference calculation - Use Int64 for dimensions in kernel to avoid overflow Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 22:11:13 -08:00
Zongfei Jing	250ad4ebde	refactor(fc1): remove num_fused_gemm and compute weight_per_expert from n // expert_count - Remove global variable num_fused_gemm (always 1) - Remove --weight_per_expert CLI argument - Compute weight_per_expert = n // expert_count in run() - Add check n == expert_count * weight_per_expert in can_implement() - Update create_alpha_scale_tensor to use computed weight_per_expert Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 21:58:14 -08:00
Zongfei Jing	4ddfa08a28	Add SwiGLU + FP4 quantization fusion with SFC verification for fc1 kernel - Add SwiGLU activation fusion in epilogue (up * silu(gate)) - Add FP4 (Float4E2M1FN) quantization with Scale Factor C (SFC) generation - Add comprehensive FP4 verification: - Compute reference SFC from SwiGLU output - Simulate f8 quantization for SFC comparison - Unswizzle kernel SFC from MMA layout to linear layout using cvt_sf_M32x4xrm_K4xrk_L_to_MKL - Compare kernel output with reference after nvfp4 quantization - Support both vectorized_f32 and scalar paths - Default sf_dtype changed to Float8E4M3FN for NVF4 compatibility Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 21:58:13 -08:00
Zongfei Jing	12e7f117c5	Add SwiGLU activation and quantization fusion to FC1 kernel - Add SwiGLU fusion in epilogue: output = up * silu(gate) - Add optional FP4 quantization with Scale Factor C (SFC) generation - Support vectorized f32x2 operations for better performance - Add epilogue warp specialization with separate up/gate accumulator handling - Use hardcoded epi_tile (128, 64) for bf16 output compatibility - Note: SFC currently only supports leading dim N (not M) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 21:58:13 -08:00
Zongfei Jing	dad843fba6	Add MoE as dense GEMM kernels for Blackwell Signed-off-by: Zongfei Jiang <zongfeij@nvidia.com>	2026-01-12 21:58:12 -08:00
Zongfei Jing	578c0a8e28	Overlap gen_fc2_alpha with fc1 using multistream in DenseGEMMFusedMoE Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 21:58:12 -08:00
Zongfei Jing	5ddbe3ca76	Add DenseGEMM backend for MoE Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-12 21:58:11 -08:00
JennyLiu	2967d299fb	[TRTLLM-10271][test] Add Spark QA functional and performance cases (#10564 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-13 13:20:15 +08:00
TensorRT LLM	ba1cb6831d	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-13 03:08:08 +00:00
fredricz-20070104	bbe535fddf	[None][chore] Fix disagg assert (#10596 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2026-01-12 21:39:57 -05:00
xxi	ba1037ca4a	[https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" (#10527 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-12 20:21:01 -05:00
Iman Tabrizian	48b09e5a25	[https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg (#10111 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-12 18:23:26 -05:00
Gal Hubara-Agam	18a33764b5	[None][chore] Print correct backend name in benchmark report (#10597 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-12 14:46:00 -05:00
Anish Shanbhag	dacc881993	[https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-12 10:55:07 -08:00
Suyog Gupta	a1385243e1	[#10580 ][fix] re-enable NemotronH MOE MMLU test (#10594 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-01-12 09:26:07 -08:00
Emma Qiao	9f044b9dd9	[None][infra] Waive failed tests for main 01/12 (#10604 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-12 10:24:54 -05:00
mpikulski	bf7998f1b8	[TRTLLM-9522][test] cover LLM API `multi_modal_embeddings` (#9963 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-12 11:38:22 +01:00
Wanli Jiang	11da7e3605	[None][fix] Solve pillow version conflict (#10537 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-12 04:05:54 -05:00
Zhenhuan Chen	3bd319dc8e	[https://nvbugs/5794796 ][chore] waive test blocking premerge (#10593 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-01-12 15:39:07 +08:00
yufeiwu-nv	8e806abac3	[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml (#10572 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-12 15:34:55 +08:00
yingguo-trt	c5914f9085	[None][chore] update deepseekv3.2 test parameter (#10595 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-12 01:43:22 -05:00
chenfeiz0326	54459377d2	[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel (#10489 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-12 14:23:23 +08:00
Xianjie Qiao	3a9a00b544	[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2026-01-12 14:10:31 +08:00
Jie Li	5e0dbba0c9	[None][chore]: update waive list (#10577 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-11 22:18:04 -05:00
TensorRT LLM	2de22f1a70	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-12 03:09:53 +00:00
Pengbo Wang	c0e25e5418	[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-11 19:26:10 -05:00
Eran Geva	c5d5af9e7f	[#8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test (#10585 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-11 16:31:24 -05:00
Ivy Zhang	7f018c89e9	[None][test] update core test list (#10538 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2026-01-11 14:08:20 -05:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Yanchao Lu	80649a8b78	[None][ci] Workaround OCI-NRT slowdown issue (#10587 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-11 22:08:19 +08:00
Guoming Zhang	0371cbfd88	[None][doc] Update Qwen3-Next doc by adding known issues section (#10582 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-11 14:47:47 +08:00
TensorRT LLM	b2e2538fcd	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-11 03:07:48 +00:00
HuiGao-NV	3c65ec3c55	[None][chore] waive test case (#10581 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-10 18:53:36 -05:00
fredricz-20070104	f6045fac09	[None][chore] Fix Gitlab CI termination issues (#10576 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2026-01-10 07:51:18 -05:00
tcherckez-nvidia	f6c4dd885f	[None][chore] Update AutoDeploy model list (#10505 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-10 08:47:37 +02:00
TensorRT LLM	6ab996d635	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-10 03:09:09 +00:00
William Zhang	ff7eb93f31	[https://nvbugs/5669097 ][tests] Add MMMU test for mistral small (#10530 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-09 16:09:28 -08:00
Chenghao Zhang	38f249b479	[https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test (#10521 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-09 13:30:24 -08:00
Linda	82dfef2e56	[https://nvbugs/5628848 ][fix] Fix nanobind stub generation (#10516 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-09 11:32:21 -08:00
Faraz	fdbdbba540	[https://nvbugs/5752687 ][fix] Choose register model config over root config for VLM (#10553 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2026-01-09 12:10:52 -05:00
yingguo-trt	d80f01d205	[None][feat] Add support for DeepSeek v3.2 tests (#10561 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-09 10:20:29 -05:00

1 2 3 4 5 ...

4612 Commits