Saad80
09/17/2025, 5:43 PMray.init()
on the server failed with: Can't find a node_ip_address.json
file from /tmp/ray/session_2025-08-30_02-00-12_518770_6921. for 60 seconds. A ray instance hasn't started. Did you do ray start
or ray.init
on this host?
103956 AM
prefect.flow_runs
Process for flow run 'vigilant-rhino' exited with status code: 1Marvin
09/17/2025, 5:43 PMMarvin
09/17/2025, 5:46 PM/tmp/ray/session_*
and write node_ip_address.json
.
- After 60s, that file wasn’t present — usually because Ray could not start on that host (permissions, temp dir issues, missing Ray install, networking, or no running head node when connecting remotely).
A few targeted checks and fixes
1) Confirm how you intend to use Ray
- Local Ray (single machine, Prefect starts it): don’t pass an address
to RayTaskRunner
.
- Remote Ray cluster (existing head/worker): pass address="ray://<host>:10001"
and ensure the Ray head is already running and reachable.
2) If you’re using Local Ray on the worker/flow-run host
- Ensure Ray is installed in the same environment as the flow run:
- For Prefect 3.x, install the extra in your image/env: pip install "prefect-ray" "ray[default]"
- Make sure Ray has a writable temp directory (this is the most common cause of the missing node_ip_address.json
):
- Set a writable location and create it:
export RAY_TMPDIR=/tmp/ray
mkdir -p "$RAY_TMPDIR" && chmod 1777 "$RAY_TMPDIR"
- You can also set this in your deployment/worker env vars.
- Quick sanity test inside the same runtime your flow uses (same container/venv):
python -c "import os, ray; os.environ['RAY_TMPDIR']='/tmp/ray'; print(ray.init()); ray.shutdown()"
If this hangs or errors, Ray can’t start on that host — check filesystem permissions, available disk space, and container security settings.
3) If you’re connecting to a Remote Ray cluster
- Ensure a Ray head node is started and the Ray Client server port is open (default 10001):
- Example: ray start --head --port=6379 --ray-client-server-port=10001
- From the flow’s environment, test connectivity:
python -c "import ray; print(ray.init(address='ray://<host>:10001')); ray.shutdown()"
- In Kubernetes with KubeRay, target the head service, e.g.:
task_runner = RayTaskRunner(address="<ray://ray-head.default.svc.cluster.local:10001>")
4) Wire up RayTaskRunner correctly in Prefect 3.x
- Local Ray:
from prefect import flow, task
from prefect_ray.task_runners import RayTaskRunner
@task
def do(x): return x * 2
@flow(task_runner=RayTaskRunner())
def my_flow():
futs = [do.submit(i) for i in range(10)]
return [f.result() for f in futs]
- Remote Ray:
@flow(task_runner=RayTaskRunner(address="ray://<host>:10001"))
def my_flow():
...
- Resource hints (CPUs/GPUs) can be set via prefect_ray.context.remote_options(...)
if needed.
5) Common environment gotchas that cause this error
- /tmp not writable or too small; fix with RAY_TMPDIR
.
- Missing Ray/Prefect-Ray in the worker image/environment.
- Networking/ports blocked when using remote Ray (10001 for Ray Client, plus others).
- Container limits too restrictive (very low open files, cgroups issues). If you see other Ray startup errors, consider raising ulimits or container resources.
If you share a bit more about your setup, I can give more precise guidance:
- Are you running Ray locally or connecting to a remote cluster?
- How are you running the flow (Prefect worker type, Docker/Kubernetes, base image)?
- Your Prefect version and how RayTaskRunner
is configured in the flow.
References
- Prefect 3.x Ray integration guide: Prefect Ray integration docs