Aric Huang05/03/2023, 8:15 PM
. Ray is initialized from within a
is used to run some logic, and then another
runs which may take a few minutes to run. After the task using Ray completes, after a minute or so the entire flow crashes with the following error:
I have a minimal example here: https://gist.github.com/concreted/6d9f4a1165fc79c510f9a63ac28363b4 Saving that to a file
gcs_rpc_client.h:533: Failed to connect to GCS within 60 seconds. GCS may have been killed. It's either GCS is terminated by `ray stop` or is killed unexpectedly. If it is killed unexpectedly, see the log file gcs_server.out. <https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure>. The program will terminate.
should reproduce the issue. I see the issue on Ubuntu 18.04, but not on Mac.
which should be the latest version
and it doesn't crash anymore, but it doesn't seem like it should be required to use
to use Ray within a task.
process starts, which is expected:
$ ps -aux | grep gcs_server aric 7796 2.8 0.0 806636 27740 pts/6 Sl+ 19:55 0:00 /opt/pyenv/versions/3.8.12/lib/python3.8/site-packages/ray/core/src/ray/gcs/gcs_server --redis_address=10.128.0.80 --redis_port=6379 ... -gcs_server_port=0 --metrics-agent-port=65472 --node-ip-address=10.128.0.80 ...
process is listed as
$ ps -aux | grep gcs_server aric 7796 2.5 0.0 0 0 pts/6 Z+ 19:55 0:00 [gcs_server] <defunct>
on the same Ubuntu 18.04 hosts, so something seems to be different in V2.
process stays active after the function using Ray returns
is started automatically when Ray is initialized, and should be automatically shutdown when the Python process exits (https://docs.ray.io/en/latest/ray-core/api/doc/ray.shutdown.html):
However when running with Prefect it seems to be getting terminated prematurely.
This will automatically run at the end when a Python process that uses Ray exits
Jacob Danovitch05/12/2023, 1:50 PM
is because it calls
before running your tasks, which you don't. If you add it to your script like this:
Or like this:
if __name__ == "__main__": ray.init() ray_test()
It doesn't crash. Tested on Ubuntu 20.04.5 LTS with
@flow(log_prints=True) def ray_test(): ray.init() test() sleep()
; the crash is reproducible without
and disappears with either of the two fixes above.