< Marvin> Crash detected Execution was interrupted by an une Prefect Community #ask-marvin

<@ULVA73B9P> Crash detected! Execution was interru...

Saad80

09/17/2025, 5:43 PM

@Marvin Crash detected! Execution was interrupted by an unexpected exception: ConnectionAbortedError: Initialization failure from server: Call to

ray.init()

on the server failed with: Can't find a

node_ip_address.json

file from /tmp/ray/session_2025-08-30_02-00-12_518770_6921. for 60 seconds. A ray instance hasn't started. Did you do

ray start

ray.init

on this host? 103956 AM prefect.flow_runs Process for flow run 'vigilant-rhino' exited with status code: 1

Marvin

09/17/2025, 5:43 PM

✅ thought for 151.1 seconds

Marvin

09/17/2025, 5:46 PM

Thanks for the error details — that Ray message is the key. It indicates Ray never finished starting on the host where your flow ran, so Prefect’s Ray task runner couldn’t initialize. What the error means - Ray expected to create a session under

/tmp/ray/session_*

and write

node_ip_address.json

. - After 60s, that file wasn’t present — usually because Ray could not start on that host (permissions, temp dir issues, missing Ray install, networking, or no running head node when connecting remotely). A few targeted checks and fixes 1) Confirm how you intend to use Ray - Local Ray (single machine, Prefect starts it): don’t pass an

address

RayTaskRunner

. - Remote Ray cluster (existing head/worker): pass

address="ray://<host>:10001"

and ensure the Ray head is already running and reachable. 2) If you’re using Local Ray on the worker/flow-run host - Ensure Ray is installed in the same environment as the flow run: - For Prefect 3.x, install the extra in your image/env:

pip install "prefect-ray" "ray[default]"

- Make sure Ray has a writable temp directory (this is the most common cause of the missing

node_ip_address.json

): - Set a writable location and create it:

Copy code

export RAY_TMPDIR=/tmp/ray
    mkdir -p "$RAY_TMPDIR" && chmod 1777 "$RAY_TMPDIR"

- You can also set this in your deployment/worker env vars. - Quick sanity test inside the same runtime your flow uses (same container/venv):

Copy code

python -c "import os, ray; os.environ['RAY_TMPDIR']='/tmp/ray'; print(ray.init()); ray.shutdown()"

If this hangs or errors, Ray can’t start on that host — check filesystem permissions, available disk space, and container security settings. 3) If you’re connecting to a Remote Ray cluster - Ensure a Ray head node is started and the Ray Client server port is open (default 10001): - Example:

ray start --head --port=6379 --ray-client-server-port=10001

- From the flow’s environment, test connectivity:

Copy code

python -c "import ray; print(ray.init(address='ray://<host>:10001')); ray.shutdown()"

- In Kubernetes with KubeRay, target the head service, e.g.:

Copy code

task_runner = RayTaskRunner(address="<ray://ray-head.default.svc.cluster.local:10001>")

4) Wire up RayTaskRunner correctly in Prefect 3.x - Local Ray:

Copy code

from prefect import flow, task
  from prefect_ray.task_runners import RayTaskRunner

  @task
  def do(x): return x * 2

  @flow(task_runner=RayTaskRunner())
  def my_flow():
      futs = [do.submit(i) for i in range(10)]
      return [f.result() for f in futs]

- Remote Ray:

Copy code

@flow(task_runner=RayTaskRunner(address="ray://<host>:10001"))
  def my_flow():
      ...

- Resource hints (CPUs/GPUs) can be set via

prefect_ray.context.remote_options(...)

if needed. 5) Common environment gotchas that cause this error - /tmp not writable or too small; fix with

RAY_TMPDIR

. - Missing Ray/Prefect-Ray in the worker image/environment. - Networking/ports blocked when using remote Ray (10001 for Ray Client, plus others). - Container limits too restrictive (very low open files, cgroups issues). If you see other Ray startup errors, consider raising ulimits or container resources. If you share a bit more about your setup, I can give more precise guidance: - Are you running Ray locally or connecting to a remote cluster? - How are you running the flow (Prefect worker type, Docker/Kubernetes, base image)? - Your Prefect version and how

RayTaskRunner

is configured in the flow. References - Prefect 3.x Ray integration guide: Prefect Ray integration docs

3 Views

Open in Slack

Previous Next