https://prefect.io logo
Title
a

Aaron

03/17/2023, 5:28 PM
I have a Prefect deployment that runs every 2 minutes to update data on live sporting events. Two minutes is enough time to run the ETL on an event before the next flow run, but ideally I would like to have a process that is always running and updating an event or multiple events that can be taking place at the same time, and after any update just check for live events again and update as needed. Essentially updating a scoreboard of live games, but without any sort of push from the primary data source. What would best practice be here? Schedule the deployment to run once a day, keeping it alive for 24 hours? It would be dormant most of the time, mostly active between 6pm and midnight, but there are occasional games at odd hours. Is this something better handled by just making my script a systemd service instead of using a scheduler like Prefect?
r

redsquare

03/17/2023, 5:34 PM
I probably wouldn't schedule the flows but manually trigger flow runs per live event (using the prefect api)
a

Aaron

03/17/2023, 5:40 PM
That's the thing, there is no trigger from the live events. Something needs to check for a game being live and grab the latest data.
r

redsquare

03/17/2023, 5:41 PM
You could have a different flow scheduled every minute that polls and triggers the actual flow
keep them independent
allows you to have multiple worker flows running concurrency if multiple live events are concurrent
a

Aaron

03/17/2023, 5:43 PM
Hmm, yeah I guess if there was a flow triggered per live event, that ran until that event is finished, with the scheduled one checking them that could work
Essentially what I'm doing now but kicking off a flow per game that will persist until the game is over. And if any of them crashed or something went wrong, the scheduled flow starts it back up if the game is still live
r

redsquare

03/17/2023, 5:45 PM
yeah the scheduled flow could check for completion status of triggered flows and restart if needed
'outbox' style