Xiwen Yu
2c3f4cbeee
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
Xiwen Yu
22219bc37e
Add B300 & GB300 CI
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:29:50 +08:00
Xiwen Yu
5d4f7f4e8d
update flashinfer and waive bug
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:09:25 +08:00
Xiwen Yu
fcf413e247
Merge branch 'user/xiweny/3xfp4_gemm' into 'feat/b300_cu13'
...
add 3xfp4 cutlass gemm
See merge request ftp/tekit!9699
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-05 00:06:42 -07:00
Xiwen Yu
973fd37457
add 3xfp4 cutlass gemm
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-05 00:06:41 -07:00
Zhanrui Sun
9ae01a8edb
Merge branch 'user/zhanruis/0828_support_cuda_13_for_sanity_check' into 'feat/b300_cu13'
...
Support DLFW sanity check use CU13 image
See merge request ftp/tekit!9689
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
2025-09-05 00:04:23 -07:00
Zhanrui Sun
5ca3376d6f
Support DLFW sanity check use CU13 image
...
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
2025-09-05 00:04:22 -07:00
Yiteng Niu
163b1fc84f
[None][infra] update nspect version ( #7552 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-05 14:59:22 +08:00
Yanchao Lu
4195010e13
[None][ci] Increase the number of retries in docker image generation ( #7557 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-05 14:47:14 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests ( #7361 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs ( #7408 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py ( #7542 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[ https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… ( #7441 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Naveenraj Kamalakannan
58d1036bb1
[ #3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding ( #7490 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception ( #6931 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Rashid Kaleem
89889fb526
[ https://nvbugs/5369366 ] [fix] Report failing requests ( #7060 )
...
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
2025-09-04 12:56:23 -07:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos ( #7360 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
Yuxian Qiu
48a5270868
[ https://nvbugs/5492485 ][fix] Use offline dataset from llm-models instead. ( #7435 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec ( #7481 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle ( #7234 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list ( #7468 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests ( #7489 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
Wanli Jiang
4e3dded64d
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #7521 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-04 20:16:10 +08:00
WeiHaocheng
5bcda7520b
[ https://nvbugs/5477730 ][fix] Fix the alltoall case when tp_size larger than ep_size ( #7331 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-04 08:10:03 -04:00
Zhanrui Sun
0de3f83805
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run BuildDockerImage ( #6729 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 07:20:15 -04:00
kris1025
cce9556858
[ https://nvbugs/5485886 ][fix] Fix resource free of Eagle3ResourceManager ( #7437 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 ( #7525 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00
jianweiwu
7090b286b2
[None][fix] fix hunyuan_moe init bug ( #7502 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
2025-09-04 03:06:00 -04:00
Grzegorz Kwasniewski
3755f8ab7d
[TRTLLM-6342][fix] Fixed triggering BMM sharding ( #7389 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-04 02:01:27 -04:00
Yanchao Lu
c622f61609
[None][fix] Fix a typo in the Slurm CI codes ( #7485 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 01:56:27 -04:00
Emma Qiao
931816fee1
[TRTLLM-6199][infra] Update for using open driver from BSL ( #7430 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-04 11:47:40 +08:00
William Zhang
a117e7a57e
[TRTLLM-7442][model] Remove unnecessary D2H copies ( #7273 )
...
* Why?
Initial profiling showed there were multiple D2H / H2D copies being
scheduled in the mistral 3.1 small model.
* What?
This commit removes those unnecessary copies by returning `image_sizes`
as a simple list instead of a tensor.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-03 23:14:20 -04:00
Jin Li
2a2dfe273b
[ https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… ( #7442 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options ( #7492 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[ https://nvbugs/5470769 ][fix] fix disagg-serving accuracy test case ( #7338 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00
Yao Yao
c1aa7f31d9
[None][fix] Fix a numerical stability issue for XQA with spec dec ( #7114 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-09-03 20:40:05 -04:00
Frida Hou
51a2b8729e
[ #7222 ][autodeploy] Separate run_shape_prop as another graph utility ( #7313 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-09-03 19:32:50 -04:00
Leslie Fang
bd9ba97d89
[None][chore] Remove two unused parameters in create_py_executor ( #7458 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-04 07:31:31 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) ( #6948 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Mike Iovine
64e3bfa054
[None][fix] Fix KV cache recompute in draft_target spec decode ( #7348 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 15:04:14 -04:00
Izzy Putterman
f156221c27
[None][doc] add GPT OSS Eagle3 blog ( #7140 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-03 12:28:01 -04:00
Lizhi Zhou
7c73c2ff4b
[ https://nvbugs/5485593 ][fix] improve accuracy/test_disaggregated_serving.py ( #7366 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-03 09:38:53 -04:00
Stanley Sun
cebbf48b74
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 ( #7083 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-03 08:36:52 -04:00
Anurag Mukkara
ae5136831f
[ https://nvbugs/5472947 ][fix] wait on isend handles before reusing buffers ( #7462 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-09-03 13:20:02 +05:30
Mike Iovine
79d93f9419
[ https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 ( #7486 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 14:10:40 +08:00
YueWeng
9a4f60687f
[ https://nvbugs/5480289 ][fix] release slot manager in mtp MTPHiddenStatesManager ( #7340 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-09-02 19:37:51 -07:00
Wanli Jiang
4223a9aada
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend ( #7371 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-03 10:27:42 +08:00
Xiwen Yu
1978227bb7
Merge branch 'user/xiweny/mha_103' into 'feat/b300_cu13'
...
update mha cubins and support 103a
See merge request ftp/tekit!9690
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-02 19:26:25 -07:00
Xiwen Yu
5bd50d477e
update mha cubins and support 103a
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-02 19:26:24 -07:00