Slackbot
09/25/2025, 3:41 PMMarvin
09/25/2025, 3:41 PMMarvin
09/25/2025, 3:45 PMget_destination_invoices
) completes and the second (get_source_invoices
) hangs with no logs, and
- Things worked until you restarted Ray on the VM,
the most likely culprit is the Ray worker environment after the restart. In practice, Oracle client connections (OCI/Autonomous DB) usually depend on local env vars and files (wallet, TNS_ADMIN
, LD_LIBRARY_PATH
, Instant Client location, etc.). When Ray is restarted (often via systemd or a different login context), those env vars and file paths are not inherited by the Ray worker processes — even if they exist for your interactive venv. That leads to tasks stalling inside the DB connect and never emitting logs.
What to check first
- Verify Ray connectivity and resources:
- ray status
and the Ray dashboard; ensure there’s free CPU and your task isn’t stuck pending resources.
- Look at VM logs under ~/.ray/session_latest/logs/*
for errors in the worker processes.
- Confirm the environment Ray workers see:
- They do not inherit your shell’s environment. If you rely on TNS_ADMIN
, LD_LIBRARY_PATH
, PATH
pointing to Instant Client, wallet files, or CA paths, those must be present in the Ray worker environment.
A quick diagnostic task
Run this under your existing RayTaskRunner
to confirm what the Ray workers see:
from prefect import task
@task
def debug_env():
import os
try:
import oracledb
oracle_ok = True
oracle_ver = getattr(oracledb, "__version__", "unknown")
except Exception as e:
oracle_ok = False
oracle_ver = str(e)
return {
"TNS_ADMIN": os.getenv("TNS_ADMIN"),
"LD_LIBRARY_PATH": os.getenv("LD_LIBRARY_PATH"),
"PATH": os.getenv("PATH"),
"ORACLE_IMPORT": oracle_ok,
"ORACLE_VER_OR_ERROR": oracle_ver,
}
If TNS_ADMIN
is empty or oracledb
import fails in this task, that’s your smoking gun.
How to fix it
Option A — pass env to Ray via Prefect’s RayTaskRunner
The RayTaskRunner
lets you pass Ray runtime_env
via init_kwargs
. Use this to propagate the required env vars and, if needed, install deps.
from prefect import flow
from prefect_ray.task_runners import RayTaskRunner
ray_runner = RayTaskRunner(
# If connecting to a remote Ray head, include its address; otherwise a local Ray will be started
# address="ray://<head-node-ip>:10001",
init_kwargs={
"runtime_env": {
"env_vars": {
"TNS_ADMIN": "/opt/oracle/wallet",
"LD_LIBRARY_PATH": "/opt/oracle/instantclient_21_11",
"PATH": "/opt/oracle/instantclient_21_11:$PATH",
# any other env vars your Oracle client needs
},
# If workers may not have the packages, you can also do:
# "pip": ["oracledb==<version>", "prefect==2.*", "prefect-ray==*"],
# "working_dir": "/path/to/your/code" # if needed
}
},
)
@flow(task_runner=ray_runner)
def your_flow():
...
Option B — start Ray with the correct environment
If you manage Ray with a systemd service or a startup script, export the variables there so Ray’s processes inherit them after restart, e.g. in the systemd unit:
- Environment= lines for TNS_ADMIN
, LD_LIBRARY_PATH
, PATH
- Ensure the wallet directory and Instant Client are readable by the user running Ray
Restart Ray after setting these and verify with the debug_env
task above.
Extra safeguards
- Add timeouts to your DB code so a broken network/client doesn’t hang forever:
- In oracledb.connect(..., timeout=30)
and/or set conn.call_timeout = 30000
before executing queries.
- Try running the flow with the default task runner (no Ray) to confirm the DB code itself is healthy.
- Make sure you’re passing address="ray://<host>:10001"
if you intend to use a remote Ray head node; after a restart, the address/port might differ if the startup command changed.Marvin
09/25/2025, 3:45 PMray start
, Docker)? Can you share that command or unit file?
- Are you using Oracle Instant Client thick mode or thin mode with a wallet? Where is the wallet located on disk?
- Do you set TNS_ADMIN
, LD_LIBRARY_PATH
, etc. in your shell profile only, or are they guaranteed for the Ray service user too?
- What versions are you on for Prefect 2, prefect-ray
, ray
, and oracledb
?
Docs for reference
- Prefect 2 Ray task runner: RayTaskRunner (Prefect 2 docs)
- Ray runtime environments: Ray runtime_env docs
If you can share the Ray start method and a debug_env
task result, I can help you pin down the exact env deltas and give a concrete patch.