chenfeiz0326
|
5e0e48144f
|
[None][fix] Minor updates on Perf Test System (#10375)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2026-01-02 17:17:42 +08:00 |
|
chenfeiz0326
|
a23c6f1092
|
[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-12-31 21:44:59 +08:00 |
|
Yanchao Lu
|
965578ca21
|
[None][infra] Some improvements for Slurm execution path in the CI (#10316)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-12-29 06:49:44 -05:00 |
|
Yanchao Lu
|
270be801aa
|
[None][ci] Move remaining DGX-B200 tests to LBD (#9876)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-12-28 13:55:39 +08:00 |
|
chenfeiz0326
|
d70aeddc7f
|
[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-12-26 22:50:53 +08:00 |
|
Yiqing Yan
|
69152c4e7c
|
[None][infra] Check GB200 coherent GPU mapping (#10253)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-12-24 17:12:36 +08:00 |
|
chenfeiz0326
|
383178c00a
|
[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (#8800)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-12-08 09:00:44 +08:00 |
|
Yanchao Lu
|
f59d64e6c7
|
[None][fix] Several minor fixes to CI setting (#9765)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-12-07 23:07:59 +08:00 |
|
Yiqing Yan
|
8c88454fa5
|
[TRTLLM-7101][infra] Reuse passed tests (#6894)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-12-03 10:07:23 +08:00 |
|
Yanchao Lu
|
f03641808b
|
[None][infra] - Request idle time exemption for OCI jobs (#9528)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-30 13:34:09 +08:00 |
|
Emma Qiao
|
658d9fc0c5
|
[TRTLLM-8970][infra] Fix generate report when has isolation test result (#8861)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-11-28 11:26:06 +08:00 |
|
Yiqing Yan
|
1c9158fde3
|
[TRTLLM-7288][infra] Download merged waive list in slurm script (#8999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-27 21:48:40 +08:00 |
|
Yanchao Lu
|
ff02e0f05c
|
[None][ci] Move more test stages to use OCI machines (#9395)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com>
|
2025-11-25 15:59:13 +08:00 |
|
chenfeiz0326
|
cc4ab8d9d1
|
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-11-03 16:23:13 +08:00 |
|
Zhanrui Sun
|
a6a3de8e35
|
[TRTLLM-9003][infra] Add python OpenSearchDB query / push. (#8506)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-10-30 19:43:51 -07:00 |
|
yuanjingx87
|
e689a73c83
|
[None][infra] fix slurm results path (#8751)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-10-30 13:09:46 +08:00 |
|
Emma Qiao
|
7c1bca4563
|
[None][infra] Fix slurm exitcode (#8585)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-10-23 09:46:00 -04:00 |
|
yuanjingx87
|
1e3e1474c6
|
[TRTLLM-6055][infra] Slurm Test refactor (#7176)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-10-20 09:46:44 -07:00 |
|
Yanchao Lu
|
e5cead1eb9
|
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing (#7739)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-16 09:59:18 +08:00 |
|
xiweny
|
c076a02b38
|
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-16 09:56:18 +08:00 |
|
Yanchao Lu
|
70aa4e28c1
|
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 23:48:04 +08:00 |
|
Yanchao Lu
|
89fc136972
|
[None][ci] Some improvements for Slurm CI (#7689)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 16:56:32 +08:00 |
|
Yanchao Lu
|
a07bb163f7
|
[None][ci] Correct docker args for GPU devices and remove some stale CI codes (#7417)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-02 04:06:51 -04:00 |
|
Yiqing Yan
|
486bc763c3
|
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-24 21:09:04 -04:00 |
|
Yanchao Lu
|
ec35481b0a
|
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-24 21:46:39 +08:00 |
|
Yiqing Yan
|
4763e94156
|
[TRTLLM-5563][infra] Move test_rerun.py to script folder (#6571)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-04 13:26:04 +08:00 |
|
Yiqing Yan
|
3f7abf87bc
|
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-03 11:18:59 +08:00 |
|
Yiqing Yan
|
d38c26bb78
|
[Infra][TRTLLM-5633] - Fix merge waive list (#6504)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-31 14:57:51 +08:00 |
|
Yiqing Yan
|
0cf2f6f154
|
[TRTLLM-5633] - Merge current waive list with the TOT waive list (#5198)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-30 17:50:05 +08:00 |
|
Emma Qiao
|
1cc49494fe
|
[Infra] - Add wiave list for pytest when using slurm (#6130)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-17 16:53:15 +08:00 |
|
yuanjingx87
|
a1c5704055
|
[feat] Multi-node CI testing support via Slurm (#4771)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-19 01:11:12 +08:00 |
|