Hello all, today I had a conceptual question: Curr...
# ask-community
m
Hello all, today I had a conceptual question: Currently, I have flows that create and start a docker container from an already built image. This gives the advantage of being able to develop the image without having to rebuild it every time we make minor changes to the code and quickly gives visibility to task failure. However, we are not taking advantage of all that Prefect offers. So my question is, how can I have prefect register tasks to Prefect Server from within a docker container and not have to rebuild the docker image every flow run. I am wanting to leverage more of Prefect features but I am unsure based off of the docs how I could accomplish the task of Executing tasks within a docker image without needing to rebuild it every time the flow is scheduled to run. Any help would or direction would be appreciated. Thank you in advance!
z
Hey @Matthew Blau -- by "rebuild" do you mean running the container? The terminology in docker land is • build (image from Dockerfile) • push (image to repository) • pull (image from repository) • run (create a container from image) • do some work in the container!
I'm a bit confused by what you're trying to get at here 🙂
m
Hi @Zanie basically I had a flow set up like this:
Copy code
with Flow(name="integration",
              #schedule=schedule,
                #state_handlers=[slack_notifier],
                storage = Docker(dockerfile="/home/lookup/integration/Dockerfile",


    )) as flow:
and upon each flow run it would use that Dockerfile to build the container and then run the tasks within. That is a pretty expensive process so we had switched to having docker-compose build an image and then create tasks to run the built image. I would like to see a way of building the container once and having tasks contained within the container orchestrated by Prefect.
I intend on subclassing the Task Class and utilizing the run function within the Task class to run the various functions that make up my program. We use Docker for managing dependencies so it's something that I am unsure about how best to utilize Prefect in this context
z
So the Dockerfile should only build to an image on flow registration, not flow run. That said, generally I'd recommend changing to a file based storage for your serialized flow (e.g. Github, S3) and then using a
DockerRun
run config for your flow that uses the Docker image with all your shared tasks installed.
m
@Zanie ah yes, in the code we originally were running flow.register, which would be why it rebuilt every time we made small changes. Would your suggestion of file based storage for the serialized flow then accomplish what I am looking to do? Have a flow in a docker image that is subclassed Task Class that has a run function that'll execute all the relevant bits of the program? I am trying to conceptualize how this would work and look. Am I to understand that I can write my code like normal as a subclass of the Task class and the last bit of the Flow to utilize a DockerRun run config to pick up and register with Prefect server?
z
The pattern I linked to should explain this but basically • Build a docker image yourself with your shared tasks that subclass Task • Write your flow in a separate file and use
S3
storage (for example) •
flow.run_config = DockerRun( -> point to the docker image you build with shared tasks)
• Register your flow (build time just has to push to S3) When you run your flow it will run in your docker container and pull the required flow information from S3