yuanjingx87
|
e689a73c83
|
[None][infra] fix slurm results path (#8751)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-10-30 13:09:46 +08:00 |
|
Emma Qiao
|
7c1bca4563
|
[None][infra] Fix slurm exitcode (#8585)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-10-23 09:46:00 -04:00 |
|
yuanjingx87
|
1e3e1474c6
|
[TRTLLM-6055][infra] Slurm Test refactor (#7176)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-10-20 09:46:44 -07:00 |
|
Yanchao Lu
|
e5cead1eb9
|
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing (#7739)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-16 09:59:18 +08:00 |
|
xiweny
|
c076a02b38
|
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-16 09:56:18 +08:00 |
|
Yanchao Lu
|
70aa4e28c1
|
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 23:48:04 +08:00 |
|
Yanchao Lu
|
89fc136972
|
[None][ci] Some improvements for Slurm CI (#7689)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 16:56:32 +08:00 |
|
Yanchao Lu
|
a07bb163f7
|
[None][ci] Correct docker args for GPU devices and remove some stale CI codes (#7417)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-02 04:06:51 -04:00 |
|
Yiqing Yan
|
486bc763c3
|
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-24 21:09:04 -04:00 |
|
Yanchao Lu
|
ec35481b0a
|
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-24 21:46:39 +08:00 |
|
Yiqing Yan
|
4763e94156
|
[TRTLLM-5563][infra] Move test_rerun.py to script folder (#6571)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-04 13:26:04 +08:00 |
|
Yiqing Yan
|
3f7abf87bc
|
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-03 11:18:59 +08:00 |
|
Yiqing Yan
|
d38c26bb78
|
[Infra][TRTLLM-5633] - Fix merge waive list (#6504)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-31 14:57:51 +08:00 |
|
Yiqing Yan
|
0cf2f6f154
|
[TRTLLM-5633] - Merge current waive list with the TOT waive list (#5198)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-30 17:50:05 +08:00 |
|
Emma Qiao
|
1cc49494fe
|
[Infra] - Add wiave list for pytest when using slurm (#6130)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-17 16:53:15 +08:00 |
|
yuanjingx87
|
a1c5704055
|
[feat] Multi-node CI testing support via Slurm (#4771)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-19 01:11:12 +08:00 |
|