https://prefect.io logo
Title
p

Pedro Machado

05/04/2020, 6:39 PM
Hi everyone. I am evaluating Prefect and would like to convert a workflow that is currently running in Airflow. This workflow calls a reporting API that returns a list of URLs pointing to data extracts (multiple compressed csv files for each report). The input parameters of the API include the start and end dates, a "report id" that tells the API the set of data we want, and a parameter that controls the interval of the report (daily, weekly, or monthly). The next step is to download each of those URLs and store them in s3 using a predefined prefix naming convention. Each report can have different a frequency and data readiness delay. For example, some weekly reports are ready on Wednesdays others are ready on Saturdays. Some daily reports are ready the next day, others the following Saturday. In principle, there is a single flow that 1) waits for a number of days or hours 2) queries the API to see if a given report is ready and 3) downloads multiple files to s3. In Airflow, I have multiple DAGs (one for each report/frequency combination). Each DAG pulls a report based on the inputs and
execution_date
. I am trying to figure out how I could implement this in Prefect. It seems to me that I could have a single parametrized flow but I am not sure how the parameters would be provided and the flow scheduled. Thanks!
k

Kyle Moon-Wright

05/04/2020, 7:10 PM
Hey @Pedro Machado, I’m sure there are members of the community that may have more specific advice, but there are definitely a few resources that can provide some inspiration. Firstly, Prefect schedules have a rich API to configure flow runs to whatever specificity, which you can configure to each report’s readiness. Separating your flows based on frequency would likely look very similar to your current workflow. Alternatively, you could go so far as to configure different clocks with varying parameter values by setting the
parameter_defaults
kwarg on each clock. This can potentially consolidate your flows based on similar schedules/parameters. Furthermore, you can change your parameters in the UI under the Flow’s Run tab to configure on the fly, rather than changing the parameter in the code. I think your migrated Prefect workflow can be as similar/different from your Airflow workflow as you like, but using the parameters values on each clock in this way is definitely Prefect-forward.
p

Pedro Machado

05/04/2020, 7:57 PM
Hi Kyle. Thanks for the suggestion. My preference would be to have a single flow with different configurations but want to make sure I can monitor and manage each report when things fail. It sounds like the clock + parameters could work. I haven't tried the UI yet so I am wondering how difficult it would be to re-run parts of the flow. For example, what if a particular report fails and I need to re-run it. Would this be harder to do if I have a single flow? Is it easy to identify certain flow runs associated with a given report so those tasks can be restarted? I am sure this will become more clear when I get further in the process. Let me know if you have other thoughts. Thanks again.
k

Kyle Moon-Wright

05/04/2020, 9:10 PM
Hmm, I think the difficulty will depend on how you ultimately architect the flow(s), the number of flows/tasks, and what your development process looks like. That being said, a single flow is definitely doable and will provide a great deal of visibility - in the UI each flow run has it’s own ID and name with State indicators for each task. From this screen, you will also be able to restart your flow and monitor the progress. Happy engineering! Let us know if you have more questions.