https://prefect.io logo
m

merlin

01/03/2023, 10:52 PM
Hello Prefect folks! I'm searching for a pattern to manage dependencies of deployments. The 'hello world' way of calling a flow from another flow isn't a good fit as you will see by the components of my small data engineering pipeline: 1. Deployments A, B, C: DDL flow that creates tables on a DB. The flow is parametrized, the parameter is the filename of a DDL SQL script sent to a DB. So I have 3 deployments of same flow calling different scripts. 2. Deployment X: Data extracts that pull from tables built in deployments A, B and C. This is a single flow parametrized by the filename of several extract SQL scripts. I want to make sure Deployment X runs only after Deployments A, B, C have run each day. Currently I manage ordering by the cron schedule, however this isn't reliable if the agent isn't continuously running, or if a DDL job takes long time, etc. Can I have Deployment X explicitly wait on runs for other deployments (parametrized) run date?
I did see this thread about `run_deployment` however I don't see a clean fit to that approach.
z

Zanie

01/04/2023, 4:09 PM
This is a tricky one, I believe this will be handled by automations in the future cc @Will Raphaelson.
w

Will Raphaelson

01/04/2023, 4:43 PM
Hmm, so currently I believe you could satisfy this with a simple automation created via UI if you were okay with the ddls running in parallel. You could set up automations to run X only after C only after B only after A. Because A, B, and C are of the same flow, youd want to listen for the flow run completions by tag. If you were set on the DDLs running in parallel, I believe you could create one automation that would satisfy this via the API by using the threshold - i.e wait for 3 flow with the DDL tag to enter completed before running X.
m

merlin

01/04/2023, 5:04 PM
is there a way to check the state of a deployment run without actually calling the
run_deployment
method?
Copy code
if previously_run_deployment(parameter1 = "2022-12-25").state.name = "Completed":
    # go ahead with this dependent flow here
I'm open to refactoring my DDL and extract jobs, I'm looking for a pattern that handles a data engineering setup that manages DDL jobs and various extracts, some of which have dependencies.
z

Zanie

01/04/2023, 6:02 PM
You can read the flow run and look at the state
(i.e. with the client)
m

merlin

01/05/2023, 5:15 AM
Thanks I'll try to learn how to do that. From here its more of a data engineer's design question. I guess the pattern starts looking like: • create a dependency checking flow, it takes table names as parameter, and run date. ◦ dependency check sees if DDL flow(table = my_table, rundate = "2022-12-31") state == completed. ◦ then proceed else wait for timeout • dependency checking flow used as subflow in all DDL, extract flows • every new "SQL job" has three parts: ◦ sql file to send to DB ◦ list of table dependencies (put in deployment yaml) ◦ create deployment for it
3 Views