https://prefect.io logo
#prefect-community
Title
# prefect-community
p

Pedro Machado

05/23/2023, 8:10 PM
Hi there. I am working on a flow that needs to do the following: • Start on a given day of the month (for example, the 6th) • Call an API to see if the data is ready. If the data is not ready, it should continue to check every few hours. If the data is not ready by the 15th of the month, it should fail. • Once the data is ready, it should do additional work I am thinking that the prefect 2 pattern would be: • Have a monthly deployment with a schedule that starts this flow on the 6th of the month. • The flow needs to have retries specified (a sufficiently large number) • Have a
check_data_availability
task that calls the API. If the data is not ready, it should raise an exception. • If the data is ready, it will continue Can you think of a better pattern? If I go with this approach, I was thinking that it would be good to define a new state
waiting for data availability
or something along those lines to differentiate it from a failure. Does this make sense? Are custom states possible? Can Prefect Cloud render custom states? How would I specify a custom state? Thanks!
h

Henning Holgersen

05/23/2023, 8:27 PM
That does sound like an interesting pattern, my immediate concern is that you would have a flow running for days, which ads noise and there is a chance the main flow might fail. We have a similar situation, and based on your details I might have settled for something like running the flow every day (or however frequent) from the 1st to the 15th, make a prefect variable that contains a kind of “high-watermark”, basically something like a time stamp of the last time the data was ready, and if it is a new month and the data is ready run the data load, if not then do nothing. If it is the 15th (or later) and the high water mark still says last month, you can trigger an error.
n

Nate

05/23/2023, 8:32 PM
instead of raising a custom state
waiting for data availability
when data is not ready, in that case you could probably just call
run_deployment
with a
scheduled_time
in the future - that way you're not spending runtime just waiting (as Henning notes above) that way you could just schedule it to run once a month on the 6th, and let it reschedule itself every so often until it can get the data and complete note that ideally, you can event off the availability of the data directly, so that some external system knows that your data is ready and can just call
create_flow_run_from_deployment
endpoint whenever that happens
🚀 2
p

Pedro Machado

05/23/2023, 8:38 PM
Hi guys. Thank you for your input. I first thought about doing something event-based. The issue is that I have multiple flows and each flow will have different data availability requirements. Some would require a combination of data sources so dispatching the events to many flows seems difficult. I'll give it some more thought. I like the idea of calling
run_deployment
with a future date as an alternative!
👍 1