Commit Graph

3868 Commits

Author SHA1 Message Date
Neta Zmora
3952a61681
[#9388][fix] AutoDeploy: Fix cutlass BF16 MoE kernel invocation (#9339)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-21 17:05:03 -08:00
Chenghao Zhang
564989865c
[TRTLLM-9082][feat] AutoDeploy: Move the moe Align kernel to AOT (#9106)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-21 16:05:48 -08:00
Izzy Putterman
eb7792e875
[None][feat] Eagle: PostNorm and multilayer options (#9233)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-11-21 17:39:00 -05:00
Enwei Zhu
13fbd4366a
[TRTLLM-9370][feat] Integration of CuteDSL NVFP4 grouped GEMM (Part 2: SwiGLU Fusion and Finalize Fusion) (#9288)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-21 14:03:38 -08:00
cheshirekow
9b2abb8d28
[TRTLLM-9208][infra] Document the process for C++ deps (#9016)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-21 09:22:11 -08:00
Ziyi Xiong
5df907b388
[https://nvbugs/5590408][fix] Fallback to greedy sampling in two-model overlap scheduler (#9321)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-21 10:19:59 -05:00
Nikita Korobov
f2ebaf288a
[None][feat] TRT-LLM Gen MoE optimize DeepSeek Fp8 activation kernel (#9175)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-11-21 15:35:00 +01:00
HuiGao-NV
6dd2fcd7b3
[https://nvbugs/5629833][fix] Don't fill tensors with 0 (#9296)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-21 20:50:05 +08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve (#9269)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
mpikulski
095b6864a8
[TRTLLM-8650][fix] beam search request validation (#8433) (#9228)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:08:45 -08:00
Yiqing Yan
8cd3b496e9
[None][chore] Bump version to 1.2.0rc4 (#9363)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 18:28:12 +08:00
Emma Qiao
041564188c
[None][infra] Waive failed cases in main post-merge on 11/21 (#9360)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-21 18:01:53 +08:00
QI JUN
b6483ef3e7
[None][ci] waive a test case of test_ad_build_small_multi.py (#9355)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-21 16:25:04 +08:00
Ivy Zhang
28e9bf6167
[None][chore] add periodic junit xml path in conftest (#9337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-20 22:46:25 -08:00
xxi
cc0dc7c124
[TRTLLM-8957][feat] create communication related classes (#8968) 2025-11-20 22:32:42 -08:00
Yiqing Yan
2a27166b59
[TRTLLM-9183][infra] Add --waives-file in rerun pytest command (#8971)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 13:40:45 +08:00
Zhanrui Sun
5138ef3227
[None][infra] Add fallback when get wheel from build stage is fail (#9290)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-21 13:26:20 +08:00
QI JUN
e2a372a3b1
[None][ci] waive test_llm_context_only_timed_out_kv_cache_exhausted (#9351)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-20 20:20:57 -08:00
TensorRT LLM
39e641872c [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-21 03:19:55 +00:00
Yingge He
b5863ed1e2
[TRI-332] [fix] Fix L0_backend_trtllm (#9282)
Signed-off-by: Yingge He <yinggeh@nvidia.com>
2025-11-20 18:55:37 -08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
Kanghwan
b1c9936c36
[None][infra] Update goggles_action repository (#9240)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-20 13:32:32 -08:00
tburt-nv
f8dd52621d
[None][chore] Upgrade starlette and FastAPI (#9319)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-11-20 11:21:14 -08:00
Mike Iovine
69b4e52757 [None][chore] Update linter rules for mass integration
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Barry Kang
a3433dd54e [https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell (#8709)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Zhanrui Sun
62e20a5441 [None][infra] Remove invaild waived tests which not in release branch (#8841)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
6185225501 [https://nvbugs/5488118][fix] Unwaive passed tests (#8758)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Dom Brown
0c8de1f45d [https://nvbugs/5575841] [test] Move test_moe.py to serial tests to improve stability + unwaive FP4 MoE torch unit tests (#8422)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
xiweny
05aabfbc1e [https://nvbugs/5601203] [fix]Restrict fp8 blockscale moe case (#8583)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Chuang Zhu
8846dac9b4 [https://nvbugs/5578175][fix] Fix block range index (#8470)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
eca68e4465 [https://nvbugs/5564465][fix] Overwrite only if default_max_tokens is legal (#8538)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Eran Geva
3d66e56adb [https://nvbugs/5572320][fix] Ported test_ad_trtllm_bench.py from main (#8671)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
9a79f32f7a [https://nvbugs/5608489][fix] Fix output unpack issues for Llama3/4 NVFP4 models. (#8679)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Ivy Zhang
25c0624750 [None][test] Clean cache for certain easily hang cases (#8619)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jie Li
36e244f35e [https://nvbugs/5587456][fix] Remove multimodal test cases using TRT backend (#8611)
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
348668e3ae [https://nvbugs/5575902][fix] set max_batch_size=1 to stabilize accuracy test result (#8609)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
33b0b945c7 [https://nvbugs/5582277][fix] rework DisaggPPTerminationHandler to fix hang issue (#8519)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yan Chunwei
b5f9fff1c1 [https://nvbugs/5569754][fix] trtllm-llmapi-launch port conflict (#8582)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
81fd9be87d [https://nvbugs/5575829][fix] Unwaive gpt-oss test (#8576)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Bo Deng
4ca6fe83d8 [https://nvbugs/5565549][fix] unwaive test_disaggregated_spec_dec_bat… (#8500)
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
3454eacd74 [https://nvbugs/5546510][fix] Move torch.cuda.Stream out of torch com… (#8494)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Guoming Zhang
af3900a195 [https://nvbugs/5504095][fix] Unwaive test_user_specify_workspace case. (#8316)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Simeng Liu
9286223288 [https://nvbugs/5515753][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. (#8440)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
ee6944bfa2 [https://nvbugs/5569713][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 (#8429)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
yufeiwu-nv
0e746fad45
[https://nvbugs/5667454][test] Fix Test Case as Chunked Attention not Supported on sm_120 (#9260)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-20 00:58:42 -08:00
Liao Lanyu
04ad9f96fa
[https://nvbugs/5667687][fix] Set correct lm_head_tp_size_upper_bound (#9300)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-20 00:41:00 -08:00
Neta Zmora
1d6fbbf45d
[#9236][feature] Make sharing of activation_type across SW layers more robust (#9238)
C++, Python and Python MoE layer all share the definition of ActivationType.
Currently this is done thru redefinition which is fragile and can break when adding new activation function types.

tensorrt_llm/_torch/utils.py
cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h
=>
tensorrt_llm/layers/moe.py
cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-20 16:06:58 +08:00
Emma Qiao
b018b2698d
[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit (#9265)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-20 15:47:23 +08:00
mpikulski
a39e8c5567
[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema (#9305)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-20 08:32:23 +01:00
Yukun He
5d118e0326
[None][chore] Revise the description of enable_autotuner. (#9320)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-19 22:59:37 -08:00