Saad80
09/05/2025, 4:57 AMMarvin
09/05/2025, 4:57 AMMarvin
09/05/2025, 5:01 AMexport DOCKER_HOST=unix:///var/run/docker.sock
- Then start your worker/flow in the same shell:
prefect worker start -p <your-pool>
- Or when running a script:
DOCKER_HOST=unix:///var/run/docker.sock python your_flow.py
- Unset/override any env vars that point to Podman (e.g., DOCKER_HOST=unix:///run/user/<uid>/podman/podman.sock).
- If your machine has podman-docker installed or a Podman socket taking over /var/run/docker.sock, either remove that shim or explicitly set DOCKER_HOST as above so Ray uses Docker.
If you’re using Ray containers via runtime_env in Prefect:
from prefect import flow, task
from prefect_ray.task_runners import RayTaskRunner
from prefect_ray.context import remote_options
@task
def say_hi():
return "hi"
@flow(task_runner=RayTaskRunner())
def my_flow():
with remote_options(runtime_env={"container": {"image": "python:3.11"}}):
return say_hi()
if __name__ == "__main__":
# Ensure this process sees DOCKER_HOST pointing at Docker (not Podman)
# export DOCKER_HOST=unix:///var/run/docker.sock
my_flow()
Ray will use whichever “Docker” socket your environment points to; the code itself does not choose Podman vs Docker.
Troubleshooting quick checks:
- See where you’re pointed:
echo $DOCKER_HOST
- If still hitting Podman, stop the user-level Podman socket or remove the podman-docker shim so /var/run/docker.sock is truly Docker, or always set DOCKER_HOST explicitly when starting Prefect/Ray.
Docs:
- Prefect + Ray integration: Prefect Ray integration
If you can share how you’re launching Ray (local vs cluster), your OS, and where you set runtime_env, I can tailor the exact steps.Saad80
09/05/2025, 5:24 AMMarvin
09/05/2025, 5:24 AMSaad80
09/05/2025, 5:27 AMMarvin
09/05/2025, 5:27 AMMarvin
09/05/2025, 5:28 AMexport DOCKER_HOST=unix:///var/run/docker.sock
- Start Ray (or your Ray service) in an environment where DOCKER_HOST points to Docker’s socket. For example:
DOCKER_HOST=unix:///var/run/docker.sock ray start --head --port=10001 ...
and the same for any worker nodes.
- If you have a podman-docker shim or a Podman socket bound at /var/run/docker.sock, either remove/disable that shim or always export DOCKER_HOST to Docker’s actual socket path so Ray doesn’t resolve to Podman.
3) Authenticate Docker to ECR (so Ray nodes can pull your image)
- On each Ray node:
aws ecr get-login-password --region <your-region> \
| docker login --username AWS --password-stdin <your-account-id>.dkr.ecr.<your-region>.<http://amazonaws.com|amazonaws.com>
- Optional: pre-pull the image with docker pull <your-ecr-uri>
to validate access.
4) Match the runtime_env syntax to your Ray version
- Ray 2.9+:
with remote_options(
num_cpus=1,
runtime_env={
"container": {
"image": "<your ECR URI>",
"pull_before_run": True
}
}
):
test.map(args)
- Ray 2.4–2.8:
with remote_options(
num_cpus=1,
runtime_env={"image_uri": "<your ECR URI>"}
):
test.map(args)
Your flow code is already set up this way; just be sure the version on your EC2 Ray nodes matches the runtime_env
shape you use.
5) Sanity checks on a Ray node
- Confirm what your environment is pointing to:
echo $DOCKER_HOST
which docker
which podman
docker info
- If podman
is installed or a podman-docker shim is present, consider removing it or explicitly set DOCKER_HOST
before starting Ray to avoid accidental Podman resolution.
Why the logs show up in ECS
- The driver (in ECS) connects to the Ray head and streams logs, so you’ll see worker-side errors (“podman not found”) in the ECS task logs even though the fix is on the Ray nodes.
Reference
- Prefect + Ray integration overview: Prefect Ray integration
If you can share:
- Ray version on your EC2 nodes
- OS/distro on the Ray nodes
- echo $DOCKER_HOST
, which docker
, which podman
from a Ray node
- A small log snippet around the “podman not found” line
…I can tailor exact commands (systemd drop-ins, Ray start lines, etc.) for your setup.Saad80
09/05/2025, 5:35 AMMarvin
09/05/2025, 5:35 AMSaad80
09/05/2025, 5:39 AMMarvin
09/05/2025, 5:39 AMMarvin
09/05/2025, 5:41 AMpodman
, which yields “podman: command not found”.
What to adjust in your AMI/provisioning
1) Ensure Docker is usable by the Ray process user at Ray start time
- You already install and start Docker and add ec2-user
to the docker
group — good.
- Critical: make sure the user that runs Ray can use Docker in that same session. Group membership changes don’t apply to already-running processes. If you start Ray before the group change is effective, Docker commands will fail with permission denied and Ray may try Podman next.
Suggestions:
- Start Ray only after verifying Docker access as `ec2-user`:
sudo -u ec2-user -g docker docker info
sudo -u ec2-user -g docker docker pull public.ecr.aws/hello-world:latest || true
- If you run Ray via systemd, include:
- After=docker.service
and Requires=docker.service
- User=ec2-user
- SupplementaryGroups=docker
- Environment=PATH=/usr/local/bin:/usr/bin:/bin
- Optionally Environment=DOCKER_HOST=unix:///var/run/docker.sock
Example Ray head unit:
sudo tee /etc/systemd/system/ray-head.service >/dev/null <<'EOF'
[Unit]
Description=Ray Head
After=docker.service
Requires=docker.service
[Service]
User=ec2-user
Group=ec2-user
SupplementaryGroups=docker
Environment=PATH=/usr/local/bin:/usr/bin:/bin
Environment=DOCKER_HOST=unix:///var/run/docker.sock
ExecStart=/home/ec2-user/.pyenv/versions/3.11.10/bin/ray start --head --port=10001 --dashboard-host=0.0.0.0
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now ray-head.service
Do the same for worker nodes.
2) Point Ray to Docker’s socket explicitly (avoid Podman shims)
- Set DOCKER_HOST for Ray’s process:
Environment=DOCKER_HOST=unix:///var/run/docker.sock
- If a podman-docker shim binds /var/run/docker.sock to Podman on your AMI, remove it or ensure DOCKER_HOST points to the real Docker socket.
- Confirm on a Ray node:
echo $DOCKER_HOST
which docker
docker info
which podman || true
3) ECR auth for the user running Ray
- You configured the ECR credential helper under /home/ec2-user/.docker/config.json
— great if Ray runs as ec2-user.
- If Ray runs as root (not recommended here), root won’t see ec2-user’s Docker config. Either:
- Run Ray as ec2-user (preferred), or
- Also configure /root/.docker/config.json
with:
{
"credsStore": "ecr-login"
}
- You can also validate from the Ray user:
sudo -u ec2-user -g docker docker pull <your-ecr-uri>
4) Match your runtime_env shape to your Ray version
- Ray 2.9+:
with remote_options(
num_cpus=1,
runtime_env={
"container": {
"image": "<your ECR URI>",
"pull_before_run": True
}
}
):
test.map(args)
- Ray 2.4–2.8:
with remote_options(
num_cpus=1,
runtime_env={"image_uri": "<your ECR URI>"}
):
test.map(args)
5) Validate on a Ray node before running via Prefect
- From one Ray node:
```
python - <<'PY'
import ray, os, subprocess
ray.init(address="ray//<HEAD IP>10001")
@ray.remote(num_cpus=0.1, runtime_env={"container":{"image":"<your ECR URI>","pull_before_run":True}})
def f():
return "ok from container"Marvin
09/05/2025, 5:41 AMsudo -u ec2-user -g docker docker info
on a node
…I can suggest exact AMI changes (including a drop-in for your usermod timing) to make this robust.
Reference:
- Prefect + Ray integration: Prefect Ray integration