Commit Graph

2713 Commits

Author SHA1 Message Date
Xiwen Yu
5e7aa76bb4 Merge branch 'user/sm103_trtllmgen' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 00:49:23 +08:00
Xiwen Yu
cca347e6b4 [TRTLLM-4629] [feat] Step1: trtllm-gen kernels support sm103
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 00:32:51 +08:00
peaceh-nv
25389c9fe2
[https://nvbugs/5453806][unwaive] Unwaive fp8 kvcache attention test (#7243)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-05 12:13:57 -04:00
Xiwen Yu
f8864b9061 update trtllm gemm
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 23:56:24 +08:00
Emma Qiao
d8ec546b73
[None][infra] Waive failed tests on main branch 0905 (#7564)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-05 22:46:46 +08:00
Leslie Fang
9eb3911470
[None][chore] Remove executor_config in create_py_executor_instance (#7463)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-05 20:56:03 +08:00
Robin Kobus
a95d9616ba
[#6186][feat] Introduce QKNormRoPEAttention module (#6830)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-09-05 14:04:41 +02:00
Xiwen Yu
2c3f4cbeee Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
Ziyi Xiong
79e0296ca0
[https://nvbugs/5461761][fix] Remove the waiver (#7476)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-05 15:29:54 +08:00
Xiwen Yu
22219bc37e Add B300 & GB300 CI
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:29:50 +08:00
Xiwen Yu
5d4f7f4e8d update flashinfer and waive bug
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:09:25 +08:00
Xiwen Yu
fcf413e247 Merge branch 'user/xiweny/3xfp4_gemm' into 'feat/b300_cu13'
add 3xfp4 cutlass gemm

See merge request ftp/tekit!9699

Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-05 00:06:42 -07:00
Xiwen Yu
973fd37457 add 3xfp4 cutlass gemm
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-05 00:06:41 -07:00
Zhanrui Sun
9ae01a8edb Merge branch 'user/zhanruis/0828_support_cuda_13_for_sanity_check' into 'feat/b300_cu13'
Support DLFW sanity check use CU13 image

See merge request ftp/tekit!9689

Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
2025-09-05 00:04:23 -07:00
Zhanrui Sun
5ca3376d6f Support DLFW sanity check use CU13 image
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
2025-09-05 00:04:22 -07:00
Yiteng Niu
163b1fc84f
[None][infra] update nspect version (#7552)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-05 14:59:22 +08:00
Yanchao Lu
4195010e13
[None][ci] Increase the number of retries in docker image generation (#7557)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-05 14:47:14 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests (#7361)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs (#7408)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py (#7542)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[https://nvbugs/5483615][fix] Remove unnecessary assertion to let mai… (#7441)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Naveenraj Kamalakannan
58d1036bb1
[#3325][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding (#7490)
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception (#6931)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Rashid Kaleem
89889fb526
[https://nvbugs/5369366] [fix] Report failing requests (#7060)
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
2025-09-04 12:56:23 -07:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos (#7360)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
Yuxian Qiu
48a5270868
[https://nvbugs/5492485][fix] Use offline dataset from llm-models instead. (#7435)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle (#7234)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list (#7468)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests (#7489)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
Wanli Jiang
4e3dded64d
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#7521)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-04 20:16:10 +08:00
WeiHaocheng
5bcda7520b
[https://nvbugs/5477730][fix] Fix the alltoall case when tp_size larger than ep_size (#7331)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-04 08:10:03 -04:00
Zhanrui Sun
0de3f83805
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run BuildDockerImage (#6729)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 07:20:15 -04:00
kris1025
cce9556858
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceManager (#7437)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 (#7525)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00
jianweiwu
7090b286b2
[None][fix] fix hunyuan_moe init bug (#7502)
Signed-off-by: sorenwu <sorenwu@tencent.com>
2025-09-04 03:06:00 -04:00
Grzegorz Kwasniewski
3755f8ab7d
[TRTLLM-6342][fix] Fixed triggering BMM sharding (#7389)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-04 02:01:27 -04:00
Yanchao Lu
c622f61609
[None][fix] Fix a typo in the Slurm CI codes (#7485)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 01:56:27 -04:00
Emma Qiao
931816fee1
[TRTLLM-6199][infra] Update for using open driver from BSL (#7430)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-04 11:47:40 +08:00
William Zhang
a117e7a57e
[TRTLLM-7442][model] Remove unnecessary D2H copies (#7273)
* Why?

Initial profiling showed there were multiple D2H / H2D copies being
scheduled in the mistral 3.1 small model.

* What?

This commit removes those unnecessary copies by returning `image_sizes`
as a simple list instead of a tensor.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-03 23:14:20 -04:00
Jin Li
2a2dfe273b
[https://nvbugs/5485102][fix] Correctly set stride for piecewise outp… (#7442)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case (#7338)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00
Yao Yao
c1aa7f31d9
[None][fix] Fix a numerical stability issue for XQA with spec dec (#7114)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-09-03 20:40:05 -04:00
Frida Hou
51a2b8729e
[#7222][autodeploy] Separate run_shape_prop as another graph utility (#7313)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-09-03 19:32:50 -04:00
Leslie Fang
bd9ba97d89
[None][chore] Remove two unused parameters in create_py_executor (#7458)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-04 07:31:31 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Mike Iovine
64e3bfa054
[None][fix] Fix KV cache recompute in draft_target spec decode (#7348)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 15:04:14 -04:00
Izzy Putterman
f156221c27
[None][doc] add GPT OSS Eagle3 blog (#7140)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-03 12:28:01 -04:00