Greetings fellow prefectionists,
What would be the best approach for a job that polls for the first 100 new records in a database, processes them, then polls for the next 100 new records, and so on? Create a new run of the Prefect flow for each poll? Using Automations to trigger the same flow upon completion of the current flow, causes the local Python process to eventually crash because all RAM has been consumed.
r
Robin Niel
01/18/2024, 5:36 PM
Hello !
Can’t you have a streaming query on your database that answers 100 rows by 100 rows ? I’ve been running a similar workflow with more than 2000 tasks total without encountering memory issue
Is there something in your code that can explain that memory consumption ?
y
Yves Thorrez
01/20/2024, 6:44 PM
Hi Robin,
Unfortunately, using a streaming query is not an option.
We want the next flow (query for top 100 records) to run only once the current flow run has finished. The Prefect cloud Automation takes care of this. This works, but for some reason in our local container new python processes are created and the previous ones are still visible in the container. Eventually, the container runs out of memory.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.