<@ULVA73B9P> Crash detected! Execution was interru...
# ask-marvin
s
@Marvin Crash detected! Execution was interrupted by an unexpected exception: ConnectionAbortedError: Initialization failure from server: Call to
ray.init()
on the server failed with: Can't find a
node_ip_address.json
file from /tmp/ray/session_2025-08-30_02-00-12_518770_6921. for 60 seconds. A ray instance hasn't started. Did you do
ray start
or
ray.init
on this host? 103956 AM prefect.flow_runs Process for flow run 'vigilant-rhino' exited with status code: 1
m
thought for 151.1 seconds
Thanks for the error details — that Ray message is the key. It indicates Ray never finished starting on the host where your flow ran, so Prefect’s Ray task runner couldn’t initialize. What the error means - Ray expected to create a session under
/tmp/ray/session_*
and write
node_ip_address.json
. - After 60s, that file wasn’t present — usually because Ray could not start on that host (permissions, temp dir issues, missing Ray install, networking, or no running head node when connecting remotely). A few targeted checks and fixes 1) Confirm how you intend to use Ray - Local Ray (single machine, Prefect starts it): don’t pass an
address
to
RayTaskRunner
. - Remote Ray cluster (existing head/worker): pass
address="ray://<host>:10001"
and ensure the Ray head is already running and reachable. 2) If you’re using Local Ray on the worker/flow-run host - Ensure Ray is installed in the same environment as the flow run: - For Prefect 3.x, install the extra in your image/env:
pip install "prefect-ray" "ray[default]"
- Make sure Ray has a writable temp directory (this is the most common cause of the missing
node_ip_address.json
): - Set a writable location and create it:
Copy code
export RAY_TMPDIR=/tmp/ray
    mkdir -p "$RAY_TMPDIR" && chmod 1777 "$RAY_TMPDIR"
- You can also set this in your deployment/worker env vars. - Quick sanity test inside the same runtime your flow uses (same container/venv):
Copy code
python -c "import os, ray; os.environ['RAY_TMPDIR']='/tmp/ray'; print(ray.init()); ray.shutdown()"
If this hangs or errors, Ray can’t start on that host — check filesystem permissions, available disk space, and container security settings. 3) If you’re connecting to a Remote Ray cluster - Ensure a Ray head node is started and the Ray Client server port is open (default 10001): - Example:
ray start --head --port=6379 --ray-client-server-port=10001
- From the flow’s environment, test connectivity:
Copy code
python -c "import ray; print(ray.init(address='ray://<host>:10001')); ray.shutdown()"
- In Kubernetes with KubeRay, target the head service, e.g.:
Copy code
task_runner = RayTaskRunner(address="<ray://ray-head.default.svc.cluster.local:10001>")
4) Wire up RayTaskRunner correctly in Prefect 3.x - Local Ray:
Copy code
from prefect import flow, task
  from prefect_ray.task_runners import RayTaskRunner

  @task
  def do(x): return x * 2

  @flow(task_runner=RayTaskRunner())
  def my_flow():
      futs = [do.submit(i) for i in range(10)]
      return [f.result() for f in futs]
- Remote Ray:
Copy code
@flow(task_runner=RayTaskRunner(address="ray://<host>:10001"))
  def my_flow():
      ...
- Resource hints (CPUs/GPUs) can be set via
prefect_ray.context.remote_options(...)
if needed. 5) Common environment gotchas that cause this error - /tmp not writable or too small; fix with
RAY_TMPDIR
. - Missing Ray/Prefect-Ray in the worker image/environment. - Networking/ports blocked when using remote Ray (10001 for Ray Client, plus others). - Container limits too restrictive (very low open files, cgroups issues). If you see other Ray startup errors, consider raising ulimits or container resources. If you share a bit more about your setup, I can give more precise guidance: - Are you running Ray locally or connecting to a remote cluster? - How are you running the flow (Prefect worker type, Docker/Kubernetes, base image)? - Your Prefect version and how
RayTaskRunner
is configured in the flow. References - Prefect 3.x Ray integration guide: Prefect Ray integration docs