https://prefect.io logo
Title
f

Federico Zambelli

02/26/2023, 12:24 PM
Hey folks, I have a bit of an odd question, but I'd say it's more a matter of opinions: should Prefect be used to orchestrate everything ? Lemme explain what I mean with an example: I am refactoring my data stack and have some old pipelines in my AWS account that I need to carry over to the new stack. These pipelines are a mix of cloudwatch events, lambda functions, glue jobs, even step functions in one case. If moving to Prefect, how can I integrate these legacy pipelines with the new ones? Is it even possible to do so? For example, how can Prefect know when a certain Glue Job or Step Function finishes running or fails? Let me know what you think!
1
r

Ryan Peden

02/26/2023, 2:27 PM
Hi Federico, great question! It's definitely possible for Prefect to know when a Glue job or Step Function succeeds or fails. Keep in mind that you can run a Prefect flow in a Lambda function. Here's a tutorial using Azure that shows what I mean; you could easily do the same thing in a Lambda function on AWS as long as you set up the environment variables that tell the flow how to connect to Prefect Cloud or your own Prefect server. In the tutorial, I the function is set up to run when a new file is uploaded to a storage bucket. In your case, you could set up CloudWatch events to trigger Lambda-based Prefect flows that run in response to Glue events or Step Function events. To take it a step further, you could then use a Prefect Cloud automation or the run_deployment function to trigger further downstream flows after a Glue job or Step Function succeeds or fails.
f

Federico Zambelli

02/26/2023, 2:36 PM
Hey @Ryan Peden thanks a lot for the detailed response.
To take it a step further, you could then use a Prefect Cloud automation or the run_deployment function to trigger further downstream flows after a Glue job or Step Function succeeds or fails.
lemme see if I understand correctly: assume I have 4 stacks of pipelines, imagine Legacy, Fivetran ETL, dbt runs, automatic reporting, and they all have to run in succession. From what I understood, I don't have to create one "master" flow to orchestrate them all, they can be their own independent deployments/flows and trigger each other using
run_deployment
. Did I get it right?
r

Ryan Peden

02/26/2023, 2:48 PM
That sounds right. If things run in the order you listed, you could write one flow that gets triggered in response to a CloudWatch event and handles your FiveTran, dbt, and handles automatic reporting all together. Keeping them independent has some advantages, though. If, for example, your legacy pipelines and Fivetran ETL were successful, but the dbt run failed, independent flows would make it easy to resume the process by just running the dbt deployment via the UI.
f

Federico Zambelli

02/26/2023, 2:53 PM
Ah that's fantastic, thanks @Ryan Peden! I have another question if you don't mind: So far I've been using prefect only locally. I have looked around for some guides about deploying in production but couldn't find much. I'm not fully understanding how I'm supposed to create and manage my deployments in a production setting, with proper CI/CD and whatnot. Do you have any suggestion about that ?
f

Federico Zambelli

02/27/2023, 3:05 PM
Hey Ryan thanks a lot, much appreciated!
r

Ryan Peden

02/27/2023, 3:05 PM
You're welcome! And if you have any other questions, please feel free to ask 🙂
👍 1