Hello all, I am looking for some guidance/examples...
# prefect-community
m
Hello all, I am looking for some guidance/examples on running flows on docker containers. Right now my company has software that runs on schedules in their own docker containers. My understanding currently of how to best utilize Prefect is to rebuild those containers to have
Copy code
FROM prefecthq/prefect:0.7.1-python3.6
or similar in the Dockerfile, which I have done so. From there I am needing to create a flow that takes this Dockerfile and builds the container and that is where my understanding is weaker and not so clear. I have @task decorations added to various bits of the integration that I am currently attempting to convert over to having Prefect handle the execution of. I am stuck with the how of writing the flow in order to have it work with this docker container. Am I needing a seperate flow.py that takes the Dockerfile of the container, build it, and run the tasks denoted by "@task" within the integration in order for this to be able to be orchestrated by Prefect? If so, how would I write the flow as an example? I feel like my understanding is flawed and would appreciate some help with this. For reference I am running 0.14.1 Thank you all in advance!
s
are you just starting to work with prefect for the first time? If that's the case I think it would be a good idea if you started off with a newer version of prefect (i.e. 0.14.0+) which emphasizes the concept of
RunConfig
images for configuring the environment in which you want to run your your flows. Those images/containers that you're already working with that house all of your dependencies are going to become your
RunConfig
images – you need to install prefect in those containers (if you want to re-author the Dockerfiles to use one of the prefecthq images as a base, that's fine, but you don't need to, you just need to make sure that
prefect
is installed in the image somehow). Once you have a fully-baked image that has prefect + all of your other flow dependencies, you will use that image when you go to define your flow by specifying that the flow's
run_config
should be of DockerRun type. This is basically saying to prefect, "hey, when an agent is ready to pick up my flow as work and execute it, it first needs to spin up a container based on this image, and then execute the flow code inside of that container"
when you're starting out I do think it's a little confusing as to how and where you need to incorporate Docker into your workflow – I can't speak for how things worked on anything earlier than 0.13.6, as I'm still relatively new to prefect, but I think it's helpful to kind of mentally separate the processes by which you build and configure your docker images/environments from the prefect flow/tasks which will execute inside of those containers/environments
m
I am currently using 0.14.1; I c/ped the version from the Prefect docs on a custom base image
👍 1
s
if you want to use prefect to manage the work of building docker images, which will then be used to serve up containers that other flows will run in, that's a slightly different story – that's where you would start using the docker tasks to build and publish your images – but if you have some other process by which you're configuring your images (e.g. we're planning on using GitHub Actions at my company), you shouldn't need to worry about
prefect
doing anything around dockerfiles or issuing docker build/run commands; you'll have your fully-baked image ready to go, and you'll feed the name of that image to the
DockerRun
config so that prefect knows that the flow's body (i.e. all the tasks it's going to end up executing) is to be run inside a container spun up off that image
this is just some almost-real-pseudo-code (not tested) but you might have a flow that looks like this:
Copy code
from prefect import Flow
from prefect.run_configs import DockerRun

import numpy as np

@task
def task_1():
    arr = np.array([1, 2, 3, 4, 5])
    print(arr)
    
with Flow(
    name="my-example-flow", 
    run_config=DockerRun(
        image="numpy-image", 
        env={'ENV_VAR1': 'value1'}
        )
    ) as flow:

    task_1()
where
numpy-image
is a docker image that you've built that has numpy installed in it (along with prefect); a prefect agent will know that when it's time to pick up the flow run, it needs to spin up a container using the
numpy-image
base
m
@Sean Talia okay, so I have a docker image that basically does API call-> create csv -> load into database. I'm wanting to expose each in turn to prefect so that if one fails I can restart that task as well as having the logging in the Prefect UI. What about my flow has to change to accommodate that use case?
s
yeah so, depending on how you have your whole deployment process set up things can vary considerably (also keep in mind here i'm only giving you tips based on my understand of how you might accomplish this stuff with prefect – i'm by no means a certified expert), but you would want to have any of the dependencies you need for interacting with the API / database (any api or database connector libraries, env variables etc.) installed/configured in the image already. as for what goes inside the flow, you'll break up each of those processes into individual tasks (so you have a task for the API call, a task for the CSV creation, a task for the CSV upload to your db), set up some logging within the tasks, and then string those tasks together by passing around the output of one as the input of another
and note that anything sensitive that your flow's tasks will need (like API tokens or db credentials) should be mounted into the container or retrieved from somewhere else at runtime
m
@Sean Talia this is where I am unsure about things; it sounds like I will need to adjust the program that we have written to accommodate this. Which is currently in a docker container. What I am picturing is that there are @task decorations for the various tasks and then call the tasks at the end of the file.
s
yeah if you want to get the most out of what prefect offers, you will have to modify the program itself by picking it apart and integrating the various prefect utilities – you could conceivably just leave it all as is and have your prefect flow do nothing but execute a single function that you decorate w/ the
@task
that in turn executes the program, but in that case you're just sort of turning prefect into a scheduling tool and not getting all the niceties around retry logic, dependency management, parallelization, etc. that you otherwise would
m
That is what I am trying to understand the process of. I've had success with prefect simply running a docker container but it does not provide much insight into success/failure with that. I'm looking for the ability to have it provide those insights and that is where my understanding is muddy. I have successfully rebuilt the docker container with prefect inside of it. Looking for guidance and examples of how to accomplish this because I feel I'm not understanding the documentation very well
Does anyone have any insight on this? I am feeling a bit lost on how to properly configure this
s
do you have any code you can share?