https://prefect.io logo
Title
a

Adam Kelleher

06/15/2020, 9:13 PM
Hello Prefect community. I'm a data engineer evaluating prefect coming from Mara data integration and I'm curious, is there a Prefect idiomatic way to represent a collection of flows as a single flow? For instance, I have a series of tasks related to replicating some data, these tasks cover all the ingress/egress and should be run as a flow every hour. However, I would like to bundle the replication task flow with an auditing task flow and run it daily, without redefining the replication task flow.
k

Kyle Moon-Wright

06/15/2020, 9:30 PM
Hello @Adam Kelleher, welcome! For this particular use case, I’d recommend implementing the FlowRunTask to correspond your replication flow to your auditing flow. Currently, this task is the best way for a flow to kick-off other flows independently, however in the future the team will implement ways to expand these flow-to-flow dependencies where a flow can potentially depend on multiple upstreams.
a

Adam Kelleher

06/15/2020, 9:36 PM
Hi Kyle, thank you for pointing me to that particular function. While not quite the same as directly adding a flow to a flow, am I correct in my understanding that this is basically running a flow as a task, with the prerequisite that the flow to run is registered already with the Prefect server?
k

Kyle Moon-Wright

06/15/2020, 9:42 PM
That is correct - the second flow would be preregistered and you’d kick off a run of that flow (in both cases of Cloud or the open-source UI). I know it’s not quite what you were looking for, but I don’t want to fully endorse adding a full flow to a task, as this can have unexpected behavior. It definitely can be done though, just not recommended.
a

Adam Kelleher

06/15/2020, 9:57 PM
Unexpected behavior? In theory I would define three flows: replicate, audit, and nightly processing. The latter would just contain two FlowRunTask tasks pointing to the replicate and audit flows, with a dependency between them. What might be unexpected about that pattern if the flows are idempotent?
k

Kyle Moon-Wright

06/15/2020, 10:14 PM
This sounds good, just wanted to clarify that this isn’t a widely used pattern and people’s expectations can be wild sometimes…
a

Adam Kelleher

06/15/2020, 10:21 PM
Haha, understood! I'm a bit spoiled by my current framework and it's idea that you can add pipelines to pipelines, but it also doesn't have a central orchestration component. Give and take I suppose.
k

Kyle Moon-Wright

06/15/2020, 10:33 PM
Cool! I’m checking out Mara Pipelines and it looks pretty interesting. Was there something in particular that brought you over to explore Prefect (besides the centralized orchestration part)? Also, feel free to let us know if there’s anything else you’d like to see implemented with Prefect - we’re always looking for ways to improve.
a

Adam Kelleher

06/15/2020, 10:47 PM
I'm at the point where I need to make significant enough changes that I can consider changing frameworks and I keep going back to Airflow as the "industry standard". But, Prefect has a lot of good press, so to speak, and looks like a lot less of a hassle to setup and run. Centralized orchestration isn't even really the reason for the change, I just want to implement flows as code to reuse more pieces and parts across different build contexts. I'm also considering leaving all my DAGs in Mara (it uses cost based DAG execution order), and just having the task I register be the collection of tasks I want run. Prefect appears to be flexible enough to handle that as an interim solution.
👍 2
k

Kyle Moon-Wright

06/15/2020, 10:51 PM
That awesome, I’m glad Prefect has made the list of tools in your repertoire! We’re happy to have you here. :marvin:
a

Adam Kelleher

06/15/2020, 10:57 PM
Is there any idea in the backlog to have cost-based execution over mapped tasks? Or is that something that would be the purview of the executor?
k

Kyle Moon-Wright

06/15/2020, 11:08 PM
Hmm, well with the 0.12.0 release there will be a mapping refactor that will include Depth-First Execution for mapped tasks, however this doesn’t completely implement cost-based execution like you mentioned. I think you are correct in that this would be in the purview of the executor, though I’m not personally aware of work like this at the current moment. I definitely like the idea though and will pass it along, there very well may be a solution/history to this idea I’m not aware of.
👍 1