https://prefect.io logo
h

Hammad A

10/02/2020, 4:55 PM
Why does it appear to be super difficult to execute each task of a flow in a separate docker container?
d

Dylan

10/02/2020, 5:01 PM
Hi @Hammad A, That’s because it is pretty difficult 😄 We call this feature per-task environments and it’s something we’ve been working towards for a long time. I believe the team is working on laying the groundwork for this. Stay tuned!
h

Hammad A

10/02/2020, 5:07 PM
Maybe it is hindsight but seeing CI/CD tools it seems to have been a good design idea from the getgo. I guess it was a go simple choice, perhaps I am underestimated the complexity of holding context and moving data between dockerized tasks
d

Dylan

10/02/2020, 5:11 PM
So, in an interesting way this is sort of possible today
Since a flow can kick off flow runs of other flows, you could actually have a parent flow that kicked off flow runs of other flows
and all of those flows could have their own environment / docker container
(we call this the Orchestrator Flow pattern)
👍 2
But we are working towards this in a first-class way
h

Hammad A

10/02/2020, 6:17 PM
Went through the docs again another time. I don't see an option for a Docker environment for a flow, but one for Kubernetes. I do see an agent that runs in Docker, but it seems agents just orchestrate. So where does the code run? In the orchestrator?
There are two different concepts with regards to flows and environments
Storage and Environment
You want to configure Docker storage for your flow
And then if you’d like, you can use the
DaskKubernetesEnvironment
j

Johnny

10/02/2020, 8:17 PM
Flow to flow @Dylan didn't know about this. Really cool!
😄 2
h

Hammad A

10/02/2020, 8:55 PM
@Dylan thanks that helps. I think I need to read more into Core Concepts as I still have to wrap my head around Environment vs Agent
d

Dylan

10/02/2020, 9:01 PM
Generally your agent will match your environment (i.e. kubernetes and kubernetes or fargate and fargate)
h

Hammad A

10/02/2020, 9:06 PM
It gets confusing as there is a docker storage option, a docker agent option, but no docker execution environment. let’s say I create a custom Docker image using storage, can that run in local, kubernetes, and fargate environments? Now if I wanted to store my flow in S3, or GCS bucket but use a docker image with custom dependencies it appears that wouldn’t fit into this well.
d

Dylan

10/02/2020, 10:33 PM
Hey @Hammad A, I totally understand that it’s a bit confusing right now (we’re actually refactoring it to be much more straightforward)
May I ask what your high-level objective is?
I get the sense that your use case includes: * Dependent actions that require different docker containers to achieve * A need for monitoring and observability
But I may be able to recommend a setup if I have a bit more info
let's say I create a custom Docker image using storage, can that run in local, kubernetes, and fargate environments
Yes re: Docker. This isn’t true for all storage types
h

Hammad A

10/04/2020, 8:33 PM
@Dylan I just have different steps in ETL that runs in different stacks of code (nodejs, python, etc). I simply want to orchestrate this ETL.
d

Dylan

10/05/2020, 8:16 PM
@Hammad A for that use case, my suggestion above (an Orchestrator flow and child flows kicked off by
FlowRunTask
s) will actually work
As each child flow can have its own environment / storage
You may want to check out the
ShellTask
as well since you’ll need to call out into the container to run your non-python scripts
s

Sean Talia

12/04/2020, 7:53 PM
good thread, I just started playing with prefect this week and have had many similar questions
(i know this thread is 2 months old but still is informative and somewhat reassuring to see that someone else was having the same difficulty)
my question here would be, what is the main purpose of the collection of
DockerTasks
? it seems like executing individual tasks that pull images/start containers is somewhat ill-advised, so what use case did the community have in mind when introducing this functionality?