Yiqing Yan
d97419805b
[TRTLLM-5312] - Add bot run rules for triton tests ( #4988 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-25 10:31:12 +08:00
yuanjingx87
ef4878db05
set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only ( #6234 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-07-22 11:27:54 -07:00
Lizhi Zhou
3e1a0fbac4
[TRTLLM-6537][infra] extend multi-gpu tests related file list ( #6139 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-07-22 16:57:06 +08:00
Yi Zhang
f9b0a911fb
test: Enable GB200 torch compile multi gpu tests ( #6145 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-21 22:17:13 +08:00
Zhanrui Sun
3cbc23f783
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) ( #4656 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-21 16:06:43 +08:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
Venky
22d4a8c48a
enh: Add script to map tests <-> jenkins stages & vice-versa ( #5177 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-19 00:50:40 +08:00
Zhanrui Sun
8454640ee1
infra: fix single-GPU stage failed will not raise error ( #6165 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-18 22:39:32 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
ixlmar
d71c6fe526
[fix] Update jenkins container images ( #6094 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-17 16:22:25 +01:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Emma Qiao
1cc49494fe
[Infra] - Add wiave list for pytest when using slurm ( #6130 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-17 16:53:15 +08:00
QI JUN
e821c68611
CI: update multi gpu test trigger file list ( #6131 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-17 14:48:23 +08:00
Zhanrui Sun
4c364b9a73
infra: fix SBSA test stage ( #6113 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-17 11:56:03 +08:00
Zhanrui Sun
e42f5a9581
infra: [TRTLLM-5879] Spilt single GPU test and multi GPU test into 2 pipelines ( #5199 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-16 18:04:04 +08:00
Bo Deng
ec3ebae43e
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 ( #5991 )
...
Signed-off-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-16 13:54:42 +08:00
Iman Tabrizian
665b4469b3
[fix] Fix Triton build ( #6076 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-16 11:17:22 +08:00
Yiteng Niu
9e871ca582
[infra] add more log on reuse-uploading ( #6036 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 17:18:38 +08:00
Zhanrui Sun
d811843a08
infra: [TRTLLM-6313] Fix the package sanity stage 'Host Node Name' in… ( #5945 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-15 15:39:31 +09:00
Yiqing Yan
6b35afaf1b
[Infra][TRTLLM-6013] - Fix stage name in single stage test rerun report ( #5672 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 12:27:21 +09:00
Zhanrui Sun
01b2def5ef
infra: [TRTLLM-6331] Support show all stage name list when stage name check failed ( #5946 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-15 12:06:03 +09:00
Alex Zhang
6c30d78b78
[TRTLLM-5653][infra] Run docs build only if PR contains only doc changes ( #5184 )
...
Signed-off-by: Alex Zhang <13271672+zhanga5@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Alex Zhang <13271672+zhanga5@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-14 21:40:33 +08:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check ( #5709 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Yi Zhang
e5e87ecf34
test: Move some of the test from post merge to pre-merge, update dgx b200 test case ( #5640 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Zhanrui Sun
67a39dbd63
infra: [TRTLLM-6054][TRTLLM-5804] Fix two known NSPECT high vulnerability issues and reduce image size ( #5434 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-10 23:24:46 +09:00
ixlmar
10e686466e
fix: use current_image_tags.properties in rename_docker_images.py ( #5846 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-09 17:07:52 +09:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell ( #5563 )
...
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Yiteng Niu
3079e8cf0c
[TRTLLM-5878] update nspect version ( #5832 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-07-08 22:00:09 +08:00
Tailing Yuan
035155df7c
Fix: ignore nvshmem_src_*.txz from confidentiality-scan ( #5831 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-08 17:17:29 +09:00
Tailing Yuan
85b4a6808d
Refactor: move DeepEP from Docker images to wheel building ( #5534 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-07 22:57:03 +09:00
Yanchao Lu
092e0eb86a
[Infra] - Fix a syntax issue in the image check ( #5775 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-07 11:19:59 +09:00
Yiteng Niu
66f299a205
[TRTLLM-5878] add stage for image registration to nspect ( #5699 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 23:52:54 +08:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures ( #5769 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Yanchao Lu
d95ae1378b
[Infra] - Always use x86 image for the Jenkins agent and few clean-ups ( #5753 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:57 +08:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation ( #5476 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
Yi Zhang
73d30a23c7
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests ( #5397 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yiqing Yan
de0b522dfd
[Infra] - Fix test stage check for the package sanity check stage ( #5694 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-03 16:39:46 +08:00
ixlmar
04fa6c0cfc
[TRTLLM-6143] feat: Improve dev container tagging ( #5551 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-02 14:56:34 +02:00
Emma Qiao
31699cbeb1
[Infra] - Set default timeout to 1hr and remove some specific settings ( #5667 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 08:37:54 -04:00
Void
7992869798
perf: better heuristic for allreduce ( #5432 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-07-01 22:56:06 -04:00
ixlmar
48eee338bf
fix: constrain grepping in docker/Makefile ( #5493 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Omer Ullman Argov
3b19634a5c
[fix][ci] missing class names in post-merge test reports ( #5603 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 22:13:29 +08:00
Emma Qiao
b8a568d3c6
[Infra][main] Cherry-pick from release/0.21: Update nccl to 2.27.5 ( #5539 ) ( #5587 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-30 18:12:08 +08:00
amirkl94
a985c0b7e6
tests: Move stress tests to be Post-Merge only ( #5166 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-06-29 09:44:47 +03:00
Iman Tabrizian
49af791f66
Add testing for trtllm-llmapi-launch with tritonserver ( #5528 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-27 11:19:52 +08:00
Omer Ullman Argov
fa0ea92dfd
[fix][ci] trigger multigpu tests for deepseek changes ( #5423 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 14:30:00 +08:00
Emma Qiao
32d1573c43
[Infra] - Add timeout setting for long tests found in post-merge ( #5501 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-26 11:31:39 +08:00
QI JUN
478f668dcc
CI: update multi gpu test triggering file list ( #5466 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-25 15:51:02 +08:00
Emma Qiao
7f68de3e3f
Refactor test timeout for individual long case ( #4757 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 13:52:11 +08:00
yunruis
b3e886074e
Fix CI build time increase ( #5337 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-06-19 13:49:42 +08:00