Yanchao Lu
c4f27fa4c0
[None][ci] Some tweaks for the CI pipeline ( #10359 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 11:10:47 -05:00
Yanchao Lu
270be801aa
[None][ci] Move remaining DGX-B200 tests to LBD ( #9876 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-28 13:55:39 +08:00
Yiteng Niu
3e39afea9a
[None][infra] update nspect version for api change ( #9899 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-12-12 11:27:42 +08:00
Zhanrui Sun
5138ef3227
[None][infra] Add fallback when get wheel from build stage is fail ( #9290 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-21 13:26:20 +08:00
Yanchao Lu
da73410d3b
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package ( #8857 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-02 09:57:37 +08:00
Yanchao Lu
89fc136972
[None][ci] Some improvements for Slurm CI ( #7689 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 16:56:32 +08:00
Yanchao Lu
045d2cf761
[None][ci] Block some nodes to avoid unstable network access ( #7593 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 00:25:38 +08:00
Yiteng Niu
163b1fc84f
[None][infra] update nspect version ( #7552 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-05 14:59:22 +08:00
Yanchao Lu
4195010e13
[None][ci] Increase the number of retries in docker image generation ( #7557 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-05 14:47:14 +08:00
Zhanrui Sun
0de3f83805
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run BuildDockerImage ( #6729 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 07:20:15 -04:00
Yanchao Lu
c622f61609
[None][fix] Fix a typo in the Slurm CI codes ( #7485 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 01:56:27 -04:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures ( #6836 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Zhanrui Sun
6a9b4b11be
[ https://nvbugs/5433581 ][infra] Temporarily disable Docker Image use wheel from build stage ( #6630 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-05 09:33:11 -04:00
Zhanrui Sun
7cbe30e17d
[TRTLLM-6893][infra] fix Build Docker Image tag issue ( #6555 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-05 04:33:36 -04:00
Zhanrui Sun
c3729dbd7d
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build ( #4939 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-29 12:54:38 -04:00
Zhanrui Sun
3cbc23f783
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) ( #4656 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-21 16:06:43 +08:00
Yiteng Niu
3079e8cf0c
[TRTLLM-5878] update nspect version ( #5832 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-07-08 22:00:09 +08:00
Yanchao Lu
092e0eb86a
[Infra] - Fix a syntax issue in the image check ( #5775 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-07 11:19:59 +09:00
Yiteng Niu
66f299a205
[TRTLLM-5878] add stage for image registration to nspect ( #5699 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 23:52:54 +08:00
ixlmar
48eee338bf
fix: constrain grepping in docker/Makefile ( #5493 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline ( #5245 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
Zhanrui Sun
a97f4581d2
infra: upload imageTag info to artifactory and add ngc_staging to save ngc image ( #4764 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-12 15:38:47 +08:00
Yanchao Lu
9e05613679
[Infra] - Update JNLP container config ( #5008 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-08 16:44:09 +08:00
Yiteng Niu
d2c311c9d3
infra: update jnlp version in container image ( #4944 )
2025-06-05 22:36:10 +08:00
Zhanrui Sun
7b2b657198
infra: [TRTLLM-5247][TRTLLM-5248][TRTLLM-5249] Refactor docker build image groovy and support NGC images ( #4294 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:23:29 +08:00
Yanchao Lu
a28cf3240c
[Infra] - Always push the release images in the post-merge job ( #4426 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 11:05:42 +08:00
Zhanrui Sun
17d48e0009
infra: [TRTLLM-5072] Add SBSA release images ( #4231 )
...
* infra: [TRTLLM-5072] Add SBSA release images and move SBSA to blossom
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix review
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Easy to review
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix BUILD_JOBS
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Use gitlab mirror for nixl and ucx
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Update BuildDockerImage.groovy
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-18 00:00:06 +08:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 ( #4235 )
...
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Martin Marciniszyn Mehringer
33977dbd42
infra: [TRTLLM-325] Prepare for NGC release - multiplatform build ( #4191 )
...
* infra: [TRTLLM-325] Prepare for NGC release - prepare multiplatform build
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-12 00:38:45 -07:00
Martin Marciniszyn Mehringer
d0e672f96d
chore: [TRTLLM-325][infra] Prepare for NGC release - reduce size of the docker images ( #3990 )
...
* chore: reduce size of the docker images
Signed-off-by: Martin Marciniszyn Mehringer <11665257+martinmarciniszyn@users.noreply.github.com>
* Finish the renaming script and run with new images.
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
* Fix installation of GCC toolset for Rocky Linux
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
* Upgrade to new docker images
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
---------
Signed-off-by: Martin Marciniszyn Mehringer <11665257+martinmarciniszyn@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-09 19:31:29 +08:00
Iman Tabrizian
74cc9e26ff
infra: install Triton in the base image ( #3759 )
...
* infra: install Triton in the base image
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* install Triton from the base image
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* update base image
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Address review comments
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* update base image
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* waive test
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-28 07:36:30 +08:00
Zhanrui Sun
587a36db96
infra: [TRTLLM-4370] Fix the build error when build GH200 image ( #3229 )
...
* infra: Fix the build error when build GH200 image
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* remove and update checkoutSource function
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-03 17:33:50 +08:00
Zhanrui Sun
1e1116ccfc
infra: Switch to urm.nvidia.com as a WAR for urm-rn.nvidia.com connection issue
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-03-31 13:05:29 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00