[CI] Make Model Executor test hangs fail fast with a traceback (#43971)

Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
2026-06-06 00:16:14 +00:00 · 2026-05-29 11:58:25 -07:00
parent 739096a028
commit 6aabe221a5
1 changed files with 9 additions and 2 deletions
@@ -14,5 +14,12 @@ steps:
  commands:
    - apt-get update && apt-get install -y curl libsodium23
    - export VLLM_WORKER_MULTIPROC_METHOD=spawn
-    - pytest -v -s model_executor -m '(not slow_test)'
-    - pytest -v -s entrypoints/openai/completion/test_tensorizer_entrypoint.py
+    # Dump tracebacks of all threads if a test hangs, so a wedged GPU/CUDA
+    # init surfaces a stack instead of silently stalling.
+    - export PYTHONFAULTHANDLER=1
+    # Per-test watchdog: a single hung test (e.g. stuck during engine/CUDA
+    # init) fails fast with a traceback instead of running until the global
+    # build timeout. The `thread` method also handles hangs inside C/CUDA
+    # calls that the signal method cannot interrupt.
+    - pytest -v -s model_executor -m '(not slow_test)' --timeout=900 --timeout-method=thread
+    - pytest -v -s entrypoints/openai/completion/test_tensorizer_entrypoint.py --timeout=900 --timeout-method=thread