Commit Graph

355 Commits

Author SHA1 Message Date
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references (#10550)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Emma Qiao
9d65b8bf24
[None][infra] Fix TRT-LLM data scratch mount point for gb10x (#10880)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 14:00:17 +08:00
zackyoray
51c7a06da6
[None][feat] Upgrade NIXL to v0.9.0 (#10896)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2026-01-23 15:58:53 +08:00
yuanjingx87
5450485bec
[None][infra] Fix sonarQube job hang by create jenkins homd folder if not exist (#10830)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-21 11:45:19 -08:00
Bo Deng
338b29d5ae
[None][infra] trigger multi-gpu tests when install_nixl/ucx.sh is mod… (#10624)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-20 17:55:32 +08:00
Yiqing Yan
99e8cb0999
[None][fix] Fix vulnerability urllib3 and nbconvert (#10551)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-20 14:51:36 +08:00
Zhanrui Sun
df845a028b
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-19 00:40:40 -05:00
Yiqing Yan
68ab1a47c4
[None][chore] Add release/1.2 branch into lockfile generation schedule (#10790)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-19 11:32:08 +08:00
chenfeiz0326
e97af45556
[TRTLLM-10300][feat] Upload regression info to artifactory (#10599)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-19 10:16:31 +08:00
yuanjingx87
e1cc8d2337
[None][infra] Add sonarqube scanning in lockfile generation pipeline (#10700)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-18 01:11:28 -08:00
Yanchao Lu
0096b50ba0
[None][infra] Update upgrade related docs for release 1.2 (#10760) (#10773)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2026-01-18 00:14:27 +08:00
chenfeiz0326
56073f501a
[TRTLLM-8263][feat] Add Aggregated Perf Tests (#10598)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-17 13:16:36 +08:00
Lucas Liebenwein
62050b2381
[None][infra] separate AutoDeploy tests into own stages (#10634)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-14 23:05:26 -05:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
Yiteng Niu
7a47e29dcb
[None][infra] support overriding nspect version (#10402)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2026-01-13 23:39:45 +08:00
tburt-nv
7d41475954
[None][infra] try removing shared cache dir mount (#10609)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-13 15:07:12 +08:00
chenfeiz0326
54459377d2
[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel (#10489)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-12 14:23:23 +08:00
Yanchao Lu
80649a8b78
[None][ci] Workaround OCI-NRT slowdown issue (#10587)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-11 22:08:19 +08:00
Emma Qiao
43839c7d9b
[TRTLLM-9642][infra] Increase pytest verbosity for failed tests (#9657)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2026-01-08 02:33:48 -05:00
Yiqing Yan
5108a69fc0
[TRTLLM-9622][infra] Enable DGX_B300 multi-gpu testing in pre-merge pipeline (#9699)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-06 14:39:55 +08:00
chenfeiz0326
a65b0d4efa
[None][fix] Decrease Pre Merge Perf Tests (#10390)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 12:21:34 -05:00
Yanchao Lu
c4f27fa4c0
[None][ci] Some tweaks for the CI pipeline (#10359)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 11:10:47 -05:00
yuanjingx87
5bd37ce41e
[None][infra] add retry logic to get slurm sbatch job log when ssh dropped (#9167)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-04 10:11:37 +08:00
chenfeiz0326
5e0e48144f
[None][fix] Minor updates on Perf Test System (#10375)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-02 17:17:42 +08:00
chenfeiz0326
a23c6f1092
[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-31 21:44:59 +08:00
Yiqing Yan
fdc03684cc
[TRTLLM-10016][infra] Use SlurmPatition attribute time as timeout threshold (#10254)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-31 15:02:24 +08:00
Emma Qiao
fb05cd769a
[None][infra] Enable single-gpu CI on spark (#9304)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-30 17:22:14 +08:00
Yanchao Lu
965578ca21
[None][infra] Some improvements for Slurm execution path in the CI (#10316)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-29 06:49:44 -05:00
Yanchao Lu
270be801aa
[None][ci] Move remaining DGX-B200 tests to LBD (#9876)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-28 13:55:39 +08:00
chenfeiz0326
d70aeddc7f
[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-26 22:50:53 +08:00
Iman Tabrizian
cd5cd60ee4
[None][infra] Move install_boost from install_triton.sh to install_base.sh (#10055)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-25 08:09:55 -05:00
Emma Qiao
16fd781e42
[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge (#9897)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-24 21:45:33 -05:00
Yiqing Yan
69152c4e7c
[None][infra] Check GB200 coherent GPU mapping (#10253)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-24 17:12:36 +08:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
chenfeiz0326
48c875f8ea
[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test (#9990)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-23 16:02:38 +08:00
JunyiXu-nv
356ad4fe3a
[https://nvbugs/5722653][fix] Address port conflict by assigning different port section in the same node. (#10035)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-19 15:34:04 +08:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend (#8309)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
yuanjingx87
df15be3fad
[None][infra] Fix slurm job does not catch cancelled jobs (#9722)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-18 00:32:43 -08:00
yuanjingx87
0a4c59136a
[None][infra] Fixing credential loading in lockfile generation pipeline (#10020)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-16 15:38:29 +08:00
QI JUN
4ce35eacf1
[TRTLLM-9794][ci] move more test cases to gb200 (#9994)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-15 19:50:41 -08:00
zackyoray
63e7a2fa70
[None][infra] Update ucx to 1.20.x (#9977)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-16 00:31:48 +08:00
dominicshanshan
825025b137
[None][infra] Add multi gpu Ray tests into L0 merge change request list. (#9996)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 15:55:54 +08:00
Matt Lefebvre
1375910f1b
[None][infra] Delete container before attempting import (#9967)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-14 00:09:33 -08:00
Yuxian Qiu
fcda1a1442
[None][fix] disable async pp send for ray cases. (#9959)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-13 20:22:36 -08:00
yuanjingx87
246a877571
[None][infra] Remove generate lockfile schedule for 1.2.0rc4.post1 branch (#9945)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-12 09:10:32 -08:00
zackyoray
d5b9ad91c9
[None][feat] Upgrade NIXL to v0.8.0 (#9707)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray 
Signed-off-by: Bo Deng 
Co-authored-by: Bo Deng
2025-12-12 20:21:10 +08:00
yuanjingx87
eeb03f314a
[None][infra] Replace the deprecated github token (#9915)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-11 22:46:14 -08:00
Chuang Zhu
bd441e9822
[None][infra] revert ucx to 1.19 (#9936)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:37:19 +08:00
Yiteng Niu
3e39afea9a
[None][infra] update nspect version for api change (#9899)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-12-12 11:27:42 +08:00
Yiqing Yan
5065b60cd1
[None][infra] Fix mergeWaiveList stage (#9892)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-12 11:19:42 +08:00