Eric
07/07/2025, 4:55 PMproject-data
: it is the project that stores the python scripts to extract, load data. It has a docker image.
• project-orchestrator
: it is the project containing the tasks and flows. It is running commands on docker images (like the docker image of project-data
Does anyone have this kind of setup? Can this kind of setup work?Nate
07/07/2025, 5:14 PM@task
def manage_command(command: str, **kwargs): ... # babysit some docker run command from a parent process
or instead you can make each ETL script its own prefect "deployment" so that you could write code like this
run_deployment("foo-script/deployment", parameters=dict(...)) # instead of `docker run -- whatever command`
where a docker worker would listen to the server for scheduled runs of this deployment (e.g. calling run_deployment
) and then spin up a container to run that specific script
Nate
07/07/2025, 5:14 PMEric
07/07/2025, 5:54 PMproject-data
I am not using any Prefect decorator (@flow
or @task
). I wanted to separate the ELT related scripts logic and the orchestration logic (in project-orchestration
. Should I do things that way when I use Prefect?
I have a @task
with a python function where i use subprocess
, is it the way to go? As the docker image is in a private repo, I don't understand i need to fill the credentials of the private docker repo?
In the block Docker Registry Credentials
, I see this code:
from prefect_docker import DockerHost, DockerRegistryCredentials
docker_host = DockerHost()
docker_registry_credentials = DockerRegistryCredentials(
username="my_username",
password="my_password",
registry_url="registry.hub.docker.com",
)
with docker_host.get_client() as client:
docker_registry_credentials.login(client)
Where am I suppose to use that?
How to deal with private docker repository?
I also don't get how you start a worker for docker
work pool. I am currently using Coolify
and I added this service in the docker compose of Coolify:
agent:
image: "prefecthq/prefect:3-python3.13"
depends_on:
prefect:
condition: service_healthy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
entrypoint:
- /opt/prefect/entrypoint.sh
- prefect
- worker
- start
- "--pool=${DEFAULT_POOL_NAME}"
- "--type=docker
- "--with-healthcheck"
- "--name=${DEFAULT_WORKER_NAME}"
- "--limit=${DEFAULT_POOL_LIMIT}"
environment:
- "PREFECT_API_URL=<http://prefect:4200/api>"
- "PREFECT_API_KEY=${SERVICE_PASSWORD_APIKEY}"
- "DEFAULT_POOL_NAME=${DEFAULT_POOL_NAME:-default}"
- "DEFAULT_POOL_LIMIT=${DEFAULT_POOL_LIMIT:-1}"
- "DEFAULT_WORKER_NAME=${DEFAULT_WORKER_NAME:-worker1}"
healthcheck:
test:
- CMD-SHELL
- pwd
interval: 5s
timeout: 5s
retries: 3
Then this service was not running. Do we need to have our laptop turned on 24/7 to run flows?Eric
07/07/2025, 6:13 PM@task(
retries=2,
retry_delay_seconds=30,
)
def extract_data(
kwargs: Optional[str] = None,
) -> dict:
logger = get_run_logger()
# Create secure env file (no secrets in command line)
env_file = get_docker_env_file()
try:
cmd = [
"docker",
"run",
"--rm",
"--pull=always",
"--env-file",
env_file,
get_docker_image(),
"python",
"-m",
"scripts.fetch_data",
]
# Add optional kwargs
if kwargs:
cmd.extend(["--kwargs", kwargs])
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return {
"status": "success",
"stdout": result.stdout,
"stderr": result.stderr,
}
except subprocess.CalledProcessError as e:
logger.error(f"Failed to extract data: {e}")
raise
finally:
# Clean up temporary env file
try:
os.unlink(env_file)
except OSError:
pass
Is it the way you were seeing things when talking about babysit
?
To use the project-data
docker image in this script, How to login to the private repo here? Do I need to login in my VM?Nate
07/07/2025, 6:27 PMrun_deployment
, ie define a deployment called something like run_script
that's associated with a docker work pool (you can put your registry creds on this work pool) so that your code that happens to be in project-orchestrator
can trigger a run of your script on that isolated container specified by your deployment's docker work pool, and then the orchestrator flow can babysit all the script containers and maybe do some logic to report on the results or do downstream things.
however if you don't need the full customization a work pool offers, the easiest approach sounds like
# orchestrator project
@task
def babysit_container(command: str, image: str):
# use subprocess or docker sdk to send command to container like you're doing now
@flow
def babysit_containers(commands: list[str]):
babysit_container.map(commands, 'foo/bar:baz').result() # spins up all containers concurrently
so this is fine, but i'd say you're leaving the convenience of the docker work pool on the table with this approach, so if you wanted the run_deployment
approach you'd just have to change babysit_container
task to call run_deployment
instead of subprocess.run
directly
---
> I also don't get how you start a worker for docker
work pool
prefect worker start --type docker --pool my-new-docker-pool
> Do we need to have our laptop turned on 24/7 to run flows?
if you want to schedule deployments, you need a worker listening for scheduled runs that will dispatch your deployment container according to your configured work pool. if you want the approach where you call subprocess.run
to use the docker API/cli yourself, then you don't need a worker process alive 24/7
---
> In my python scripts in project-data
I am not using any Prefect decorator (@flow
or @task
). I wanted to separate the ELT related scripts logic and the orchestration logic (in project-orchestration
. Should I do things that way when I use Prefect?
typically people use prefect to give their ETL scripts features like retries, caching, and concurrent execution, which require the decorators, but there's no requirement that you do this
---
so that's a lot of info, but based on what you've said I'd recommend the following
• make a docker registry credentials block like you mentioned above, use that to get an authed docker client in your orchestrator flow
• use that client in your babysit_container
task to trigger your script on a container as it sounds like you have been (client.containers.run
instead of using subprocess.run
to talk to the CLI)
• for later: consider making deployments for your ETL scripts individually so that you can use run_deployment
and very simply configure your docker infra for the ETL scripts, independently of the parent orchestrator scriptEric
07/07/2025, 6:35 PMEric
07/07/2025, 6:49 PMdocker
worker listening to the docker
work pool, I didn’t get what is running. This service in the docker compose or something else?
agent:
image: "prefecthq/prefect:3-python3.13"
depends_on:
prefect:
condition: service_healthy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
entrypoint:
- /opt/prefect/entrypoint.sh
- prefect
- worker
- start
- "--pool=${DEFAULT_POOL_NAME}"
- "--type=docker
- "--with-healthcheck"
- "--name=${DEFAULT_WORKER_NAME}"
- "--limit=${DEFAULT_POOL_LIMIT}"
environment:
- "PREFECT_API_URL=<http://prefect:4200/api>"
- "PREFECT_API_KEY=${SERVICE_PASSWORD_APIKEY}"
- "DEFAULT_POOL_NAME=${DEFAULT_POOL_NAME:-default}"
- "DEFAULT_POOL_LIMIT=${DEFAULT_POOL_LIMIT:-1}"
- "DEFAULT_WORKER_NAME=${DEFAULT_WORKER_NAME:-worker1}"
healthcheck:
test:
- CMD-SHELL
- pwd
interval: 5s
timeout: 5s
retries: 3
I don’t get what is a worker physically? What differs a `docker`worker to a process
worker?
In Coolify, we just have by default a process work pool.Nate
07/07/2025, 6:52 PMtype
docker workers submit flow runs as docker containers, process workers submit flow runs as subprocesses
tldr: workers poll for work and create the runtime infra for the flow run according to the work pool it listens toEric
07/11/2025, 4:36 PMVolumes
to have access to docker by providing the socket mapping
• `Networks`: add at least the network where Prefect UI/API is (or worker, i am not sure, but they are in the same network in my docker compose)
Also I needed to have a custom docker compose in order to run a docker
worker, compared to the process
worker docker compose one.