Zongfei Jing
3c5f97bf57
fix(custom_ops): update candidates for MMA tiling and cluster shapes
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:13:50 -08:00
Zongfei Jing
45b468c66e
fix(custom_ops): refactor kernel handling
...
- Increased tune_max_num_tokens from 2 to 256 for improved performance.
- Refactored kernel handling by creating pointers for kernel arguments and caching compiled kernels to optimize execution.
- Adjusted tensor handling in unit tests to ensure compatibility with the updated kernel interface.
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:13:49 -08:00
Zongfei Jing
b6b7aa3592
fix(moe): reshape fc1_output_sf for compatibility in DenseGEMMFusedMoE
...
- Adjusted the shape of fc1_output_sf to ensure compatibility with the dense GEMM operation.
- This change enhances the integration of the FC2 kernel by ensuring proper tensor dimensions.
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:13:49 -08:00
Zongfei Jing
a0f523c628
refactor(moe): improve FC1/FC2 get_valid_tactics and tuning config
...
- Refactor get_valid_tactics for FC1 and FC2 runners to define
mma_tiler_mn_candidates and cluster_shape_mn_candidates together
- Use itertools.product for cleaner iteration pattern
- Update get_tuning_config for FC1 and FC2 to use DynamicTensorSpec
and ConstraintSpec for proper dynamic tensor handling
- FC1: Add constraint for input scale factor shape inference
- FC2: Add constraints for input scale factor and alpha_scale shape
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:13:46 -08:00
Zongfei Jing
672df6a422
Rename test file to test_moe_densegemm.py
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:19 -08:00
Zongfei Jing
41f81093e2
Add FC2 kernel integration and unit tests for MoE dense GEMM
...
- Add wrapper method to FC2 kernel for pointer-based API
- Add CuteDSLNVFP4DenseGemmFC2Runner in custom_ops
- Register trtllm::cute_dsl_nvfp4_dense_gemm_fc2_blackwell custom op
- FC2 supports per-token-per-expert alpha_scale (m, expert_count)
- Add nvfp4_dense_gemm_fc2_ref reference implementation
- Add test_nvfp4_dense_gemm_fc2_blackwell parametrized test
- Fix FC2 kernel k_tile_cnt to use Int32 for cutlass.range
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:18 -08:00
Zongfei Jing
35631a37ad
Optimize gen_fc2_alpha with fused kernel
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:18 -08:00
Zongfei Jing
1df9e8a0fc
Add missing __init__.py to moe_as_dense_gemm package
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:17 -08:00
Zongfei Jing
49d887f521
Fix dense GEMM integration and add scale factor validation
...
- Fix c_sf shape calculation: use pad_up(m, 128) // 128 for non-128-aligned m
- Change c_sf dtype to uint8 to match fp4_utils.py SF_DTYPE
- Add scale factor shape and value validation in unit test
- Fix test to handle padded scale factors correctly
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:17 -08:00
Zongfei Jing
84dbc447f4
Add NVFP4 dense GEMM with SwiGLU fusion integration and unit tests
...
- Add CuteDSLNVFP4DenseGemmSwigluRunner to custom_ops for FC1 kernel
- Support FP4 output dtype in dense GEMM kernel
- Add unit test for dense GEMM with SwiGLU fusion
- Fix per-expert SwiGLU application in reference calculation
- Use Int64 for dimensions in kernel to avoid overflow
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 22:11:13 -08:00
Zongfei Jing
250ad4ebde
refactor(fc1): remove num_fused_gemm and compute weight_per_expert from n // expert_count
...
- Remove global variable num_fused_gemm (always 1)
- Remove --weight_per_expert CLI argument
- Compute weight_per_expert = n // expert_count in run()
- Add check n == expert_count * weight_per_expert in can_implement()
- Update create_alpha_scale_tensor to use computed weight_per_expert
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 21:58:14 -08:00
Zongfei Jing
4ddfa08a28
Add SwiGLU + FP4 quantization fusion with SFC verification for fc1 kernel
...
- Add SwiGLU activation fusion in epilogue (up * silu(gate))
- Add FP4 (Float4E2M1FN) quantization with Scale Factor C (SFC) generation
- Add comprehensive FP4 verification:
- Compute reference SFC from SwiGLU output
- Simulate f8 quantization for SFC comparison
- Unswizzle kernel SFC from MMA layout to linear layout using cvt_sf_M32x4xrm_K4xrk_L_to_MKL
- Compare kernel output with reference after nvfp4 quantization
- Support both vectorized_f32 and scalar paths
- Default sf_dtype changed to Float8E4M3FN for NVF4 compatibility
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 21:58:13 -08:00
Zongfei Jing
12e7f117c5
Add SwiGLU activation and quantization fusion to FC1 kernel
...
- Add SwiGLU fusion in epilogue: output = up * silu(gate)
- Add optional FP4 quantization with Scale Factor C (SFC) generation
- Support vectorized f32x2 operations for better performance
- Add epilogue warp specialization with separate up/gate accumulator handling
- Use hardcoded epi_tile (128, 64) for bf16 output compatibility
- Note: SFC currently only supports leading dim N (not M)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 21:58:13 -08:00
Zongfei Jing
dad843fba6
Add MoE as dense GEMM kernels for Blackwell
...
Signed-off-by: Zongfei Jiang <zongfeij@nvidia.com>
2026-01-12 21:58:12 -08:00
Zongfei Jing
578c0a8e28
Overlap gen_fc2_alpha with fc1 using multistream in DenseGEMMFusedMoE
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 21:58:12 -08:00
Zongfei Jing
5ddbe3ca76
Add DenseGEMM backend for MoE
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-12 21:58:11 -08:00
JennyLiu
2967d299fb
[TRTLLM-10271][test] Add Spark QA functional and performance cases ( #10564 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-13 13:20:15 +08:00
TensorRT LLM
ba1cb6831d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-13 03:08:08 +00:00
fredricz-20070104
bbe535fddf
[None][chore] Fix disagg assert ( #10596 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-12 21:39:57 -05:00
xxi
ba1037ca4a
[ https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" ( #10527 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Gal Hubara-Agam
18a33764b5
[None][chore] Print correct backend name in benchmark report ( #10597 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-12 14:46:00 -05:00
Anish Shanbhag
dacc881993
[ https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests ( #10192 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-12 10:55:07 -08:00
Suyog Gupta
a1385243e1
[ #10580 ][fix] re-enable NemotronH MOE MMLU test ( #10594 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-01-12 09:26:07 -08:00
Emma Qiao
9f044b9dd9
[None][infra] Waive failed tests for main 01/12 ( #10604 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-12 10:24:54 -05:00
mpikulski
bf7998f1b8
[TRTLLM-9522][test] cover LLM API multi_modal_embeddings ( #9963 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-12 11:38:22 +01:00
Wanli Jiang
11da7e3605
[None][fix] Solve pillow version conflict ( #10537 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-12 04:05:54 -05:00
Zhenhuan Chen
3bd319dc8e
[ https://nvbugs/5794796 ][chore] waive test blocking premerge ( #10593 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-12 15:39:07 +08:00
yufeiwu-nv
8e806abac3
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml ( #10572 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-12 15:34:55 +08:00
yingguo-trt
c5914f9085
[None][chore] update deepseekv3.2 test parameter ( #10595 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-12 01:43:22 -05:00
chenfeiz0326
54459377d2
[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel ( #10489 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-12 14:23:23 +08:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe ( #10401 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Jie Li
5e0dbba0c9
[None][chore]: update waive list ( #10577 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-11 22:18:04 -05:00
TensorRT LLM
2de22f1a70
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-12 03:09:53 +00:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Eran Geva
c5d5af9e7f
[ #8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test ( #10585 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-11 16:31:24 -05:00
Ivy Zhang
7f018c89e9
[None][test] update core test list ( #10538 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-11 14:08:20 -05:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
Yanchao Lu
80649a8b78
[None][ci] Workaround OCI-NRT slowdown issue ( #10587 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-11 22:08:19 +08:00
Guoming Zhang
0371cbfd88
[None][doc] Update Qwen3-Next doc by adding known issues section ( #10582 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-11 14:47:47 +08:00
TensorRT LLM
b2e2538fcd
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-11 03:07:48 +00:00
HuiGao-NV
3c65ec3c55
[None][chore] waive test case ( #10581 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-10 18:53:36 -05:00
fredricz-20070104
f6045fac09
[None][chore] Fix Gitlab CI termination issues ( #10576 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2026-01-10 07:51:18 -05:00
tcherckez-nvidia
f6c4dd885f
[None][chore] Update AutoDeploy model list ( #10505 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-10 08:47:37 +02:00
TensorRT LLM
6ab996d635
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-10 03:09:09 +00:00
William Zhang
ff7eb93f31
[ https://nvbugs/5669097 ][tests] Add MMMU test for mistral small ( #10530 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-09 16:09:28 -08:00
Chenghao Zhang
38f249b479
[ https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test ( #10521 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-09 13:30:24 -08:00
Linda
82dfef2e56
[ https://nvbugs/5628848 ][fix] Fix nanobind stub generation ( #10516 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-09 11:32:21 -08:00
Faraz
fdbdbba540
[ https://nvbugs/5752687 ][fix] Choose register model config over root config for VLM ( #10553 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-09 12:10:52 -05:00
yingguo-trt
d80f01d205
[None][feat] Add support for DeepSeek v3.2 tests ( #10561 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-09 10:20:29 -05:00