Hi there I am working on a flow that needs to do the followi Prefect Community #ask-community

Hi there. I am working on a flow that needs to do ...

Pedro Machado

05/23/2023, 8:10 PM

Hi there. I am working on a flow that needs to do the following: • Start on a given day of the month (for example, the 6th) • Call an API to see if the data is ready. If the data is not ready, it should continue to check every few hours. If the data is not ready by the 15th of the month, it should fail. • Once the data is ready, it should do additional work I am thinking that the prefect 2 pattern would be: • Have a monthly deployment with a schedule that starts this flow on the 6th of the month. • The flow needs to have retries specified (a sufficiently large number) • Have a

check_data_availability

task that calls the API. If the data is not ready, it should raise an exception. • If the data is ready, it will continue Can you think of a better pattern? If I go with this approach, I was thinking that it would be good to define a new state

waiting for data availability

or something along those lines to differentiate it from a failure. Does this make sense? Are custom states possible? Can Prefect Cloud render custom states? How would I specify a custom state? Thanks!

Henning Holgersen

05/23/2023, 8:27 PM

That does sound like an interesting pattern, my immediate concern is that you would have a flow running for days, which ads noise and there is a chance the main flow might fail. We have a similar situation, and based on your details I might have settled for something like running the flow every day (or however frequent) from the 1st to the 15th, make a prefect variable that contains a kind of “high-watermark”, basically something like a time stamp of the last time the data was ready, and if it is a new month and the data is ready run the data load, if not then do nothing. If it is the 15th (or later) and the high water mark still says last month, you can trigger an error.

Nate

05/23/2023, 8:32 PM

instead of raising a custom state

waiting for data availability

when data is not ready, in that case you could probably just call

run_deployment

with a

scheduled_time

in the future - that way you're not spending runtime just waiting (as Henning notes above) that way you could just schedule it to run once a month on the 6th, and let it reschedule itself every so often until it can get the data and complete note that ideally, you can event off the availability of the data directly, so that some external system knows that your data is ready and can just call

create_flow_run_from_deployment

endpoint whenever that happens

🚀 2

Pedro Machado

05/23/2023, 8:38 PM

Hi guys. Thank you for your input. I first thought about doing something event-based. The issue is that I have multiple flows and each flow will have different data availability requirements. Some would require a combination of data sources so dispatching the events to many flows seems difficult. I'll give it some more thought. I like the idea of calling

run_deployment

with a future date as an alternative!

👍 1

Open in Slack

Previous Next