Hi all, How can one use a single Docker image to a...
# ask-community
a
Hi all, How can one use a single Docker image to add multiple flows as scripts? I want the flows to be dynamic, as in be able to load env variables at task/flow execution, not at registration. I followed this but this registers a pickled flow.... If I define the storage at the flow level, I can use the
stored_as_script
argument in Docker() along with
path
, but cannot see how I can avoid building and pushing an image for each flow that way...
a
Hi @Andreas Tsangarides. If I understood correctly, you use Docker storage and you want to avoid pushing the image to the registry. If you don't specify the
registry_url
, then the image will be built, but it won't be pushed. The image will only live on the local machine you ran the registration step. You can also explicitly specify `push=False`:
Copy code
built_storage = flow.storage.build(push=False)

# this gives you a dictionary of flows and paths within the image
built_storage.flows
# ex. {"your-flow": "/root/.prefect/your-flow.prefect"}
a
Hi Anna! Sorry. Did not describe correctly. I want to have all my flows in one repo as they share common helper functions. So i want to create a single docker image for the whole thing, that contains all the flows
a
Gotcha. I'll have a look at what's the best way to implement it. Which platform do you use to store your code (Github, Gitlab, ...)? Perhaps it would be easier to use one of Git storage classes to point to your flows, and configure the python dependencies within a single Docker image (depends on which agent you use).
1
Btw, did you hear about Orion? In the future, what you describe will be even easier to accomplish in Prefect.
@Andreas Tsangarides I briefly looked into the Docker storage configuration, and if you want to avoid creating separate images for each flow, you can specify your
image_name
and
image_tag
explicitly. Since the image layers are cached, the build process is fast. To test, you can create several flows with a similar configuration to this:
Copy code
from prefect import Flow, task
from prefect.storage import Docker
from prefect.run_configs import DockerRun

@task
def hello_world():
    return "hello from Docker Flow #1"


with Flow(
    "docker-flow-1",
    storage=Docker(image_name="andreas-image", image_tag="latest"),
    run_config=DockerRun()
) as flow:
    hello_world()


if __name__ == '__main__':
    flow.register("Docker_Flows")
let me know if this works for you.
a
Thanks for looking into this 🙂 So my project would look like the screenshot below. Each flow would be its own module, and defined in the respective
flow.py
file So, if I was to follow your suggestion: 1. I either build that image using
docker run...
or I just register the 1st flow 2. When I register the second, it will use the cached image from (1) and add the second flow to the image? Instead of the storage you defined for each flow, can I use something like this in each one of them?
Copy code
storage = Docker(
    # registry_url='<http://455197153980.dkr.ecr.eu-west-2.amazonaws.com/|455197153980.dkr.ecr.eu-west-2.amazonaws.com/>',
    image_name="uk-prefect-flows-dev",
    image_tag="latest",
    dockerfile='Dockerfile',
    path="src/flows/elexon_detsysprices/flow.py",
    stored_as_script=True
)
k
Hey @Andreas Tsangarides, so just chatted with Anna and don’t do the same name and image because it will override the previous images. This does team undoable, right now. I might open a ticket for this
upvote 1
But I will throw out that a common setup is to use something like S3/Git storage and DockerRun with the pre-built image with dependencies. You can then register those independently.
🙌 1
a
Oh my oh my... Yeah Anna and Kevin are right of course... OK just in case anyone reads through this mess of a thread I created: Goal: • Single repo for multiple flows • Single docker image for all flows (can build manually with
docker run....
or can define a Prefect
Docker
storage and then call
.build()
on it • Each flow is registered (using Prefect Storage attached to the flow) using
S3
. • The flows are registered as scripts, so everything is executed at run time, not at registration time!!!! That was critical for me, otherwise the Prefect tutorial for registering multiple flows using DockerStorage works So.. in each flow:
Copy code
# src/flows/elexon_detsysprices/flow.py
# imports
ENV = os.getenv("ENV", "local")
from prefect.storage import S3
storage = S3(
    bucket="<bucket-name>",
    key=f"{ENV}/elexon-detsysprices",
    stored_as_script=True,
    local_script_path=os.path.abspath(__file__)
)

with Flow("flow-name", storage=storage) as flow:
    # .......

# do attach your run_config before registering! I use DockerRun here 
# DockerRun(image="uk-prefect-flows-dev:latest", env={"ENV": "local"}, labels=["local"])
# register your flow (I do it using click elsewhere using the prefect client)
To build your image once...
Copy code
# I do this in cli.py
storage = Docker(
    # registry_url=registry_url, keep as None for local dev otherwise point to your ECR/dockerhub repo
    image_name=image_name,
    image_tag=image_tag,
    dockerfile='Dockerfile'
)
storage.build(push=push) # push=False for keeping image on local machine
upvote 2