Yanchao Lu
045d2cf761
[None][ci] Block some nodes to avoid unstable network access ( #7593 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 00:25:38 +08:00
Netanel Haber
0fee8cd028
[TRTLLM-7153] [feat] Move stop_criteria to sample_async ( #7041 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-09-07 17:36:49 +03:00
Emma Qiao
5c4711fb2b
[None][infra] Skip RTX Pro 6000 test stages due to HW are offline ( #7592 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 09:49:06 -04:00
Raayan Dhar
bae9560e62
[ https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks ( #7455 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Emma Qiao
aea8ac1649
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base ( #7086 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 14:26:18 +08:00
Mike Iovine
45390402fc
[ https://nvbugs/5502352 ][fix] Fix 2-model CDL path ( #7543 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-06 23:53:27 -04:00
Chang Liu
99b98f1374
[TRTLLM-7440][fix] Split fused_input_embed to separate out host sync ( #7280 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-06 23:11:39 -04:00
xiweny
0fdc6c7278
[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 ( #7570 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:04:10 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse ( #7106 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
QI JUN
12ecb864c2
[None][chore] share input_ids buffers among different cuda graphs ( #7236 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 17:49:42 -04:00
Anthony Chang
12c66f7610
[None][fix] DeepSeek-R1 W4A8 weight loading issue; fixes regression from #6200 ( #7123 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-09-07 00:04:56 +08:00
dominicshanshan
9a97f0a3b7
[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 ( #7585 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-06 21:29:16 +08:00
Yanchao Lu
caf9b9cd42
[None][ci] Improve SSH connection stability ( #7567 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-06 17:08:19 +08:00
QI JUN
525bb806a9
[None][ci] move some test cases of DGX H100 to post merge ( #7569 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 01:03:38 -04:00
QI JUN
b8183cac2b
[None][ci] Revert "[ https://nvbugs/5461761 ][fix] Remove the waiver ( #7476 )" ( #7584 )
2025-09-05 22:02:09 -07:00
Lucas Liebenwein
74105a45d9
[ #6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example ( #7221 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
peaceh-nv
25389c9fe2
[ https://nvbugs/5453806 ][unwaive] Unwaive fp8 kvcache attention test ( #7243 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-05 12:13:57 -04:00
Emma Qiao
d8ec546b73
[None][infra] Waive failed tests on main branch 0905 ( #7564 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-05 22:46:46 +08:00
Leslie Fang
9eb3911470
[None][chore] Remove executor_config in create_py_executor_instance ( #7463 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-05 20:56:03 +08:00
Robin Kobus
a95d9616ba
[ #6186 ][feat] Introduce QKNormRoPEAttention module ( #6830 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-09-05 14:04:41 +02:00
Ziyi Xiong
79e0296ca0
[ https://nvbugs/5461761 ][fix] Remove the waiver ( #7476 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-05 15:29:54 +08:00
Yiteng Niu
163b1fc84f
[None][infra] update nspect version ( #7552 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-05 14:59:22 +08:00
Yanchao Lu
4195010e13
[None][ci] Increase the number of retries in docker image generation ( #7557 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-05 14:47:14 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests ( #7361 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs ( #7408 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py ( #7542 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[ https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… ( #7441 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Naveenraj Kamalakannan
58d1036bb1
[ #3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding ( #7490 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception ( #6931 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Rashid Kaleem
89889fb526
[ https://nvbugs/5369366 ] [fix] Report failing requests ( #7060 )
...
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
2025-09-04 12:56:23 -07:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos ( #7360 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
Yuxian Qiu
48a5270868
[ https://nvbugs/5492485 ][fix] Use offline dataset from llm-models instead. ( #7435 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec ( #7481 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle ( #7234 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list ( #7468 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests ( #7489 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
Wanli Jiang
4e3dded64d
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #7521 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-04 20:16:10 +08:00
WeiHaocheng
5bcda7520b
[ https://nvbugs/5477730 ][fix] Fix the alltoall case when tp_size larger than ep_size ( #7331 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-04 08:10:03 -04:00
Zhanrui Sun
0de3f83805
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run BuildDockerImage ( #6729 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 07:20:15 -04:00
kris1025
cce9556858
[ https://nvbugs/5485886 ][fix] Fix resource free of Eagle3ResourceManager ( #7437 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 ( #7525 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00
jianweiwu
7090b286b2
[None][fix] fix hunyuan_moe init bug ( #7502 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
2025-09-04 03:06:00 -04:00
Grzegorz Kwasniewski
3755f8ab7d
[TRTLLM-6342][fix] Fixed triggering BMM sharding ( #7389 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-04 02:01:27 -04:00
Yanchao Lu
c622f61609
[None][fix] Fix a typo in the Slurm CI codes ( #7485 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 01:56:27 -04:00
Emma Qiao
931816fee1
[TRTLLM-6199][infra] Update for using open driver from BSL ( #7430 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-04 11:47:40 +08:00
William Zhang
a117e7a57e
[TRTLLM-7442][model] Remove unnecessary D2H copies ( #7273 )
...
* Why?
Initial profiling showed there were multiple D2H / H2D copies being
scheduled in the mistral 3.1 small model.
* What?
This commit removes those unnecessary copies by returning `image_sizes`
as a simple list instead of a tensor.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-03 23:14:20 -04:00
Jin Li
2a2dfe273b
[ https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… ( #7442 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options ( #7492 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[ https://nvbugs/5470769 ][fix] fix disagg-serving accuracy test case ( #7338 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00