Hi,
I have been asking for help some time ago, regarding a issue of tasks and flows getting longer and longer to complete until the whole application freeze (my processors being filled with many processes).
I have been told to dockerize the app through the guide provided in this
link, which I have done, also because it made our deployment process cleaner and more robust. That was quite some work but it has been making our job a lot easier.
Unfortunately, we can still observe the same pattern today, though less extreme. I have joined a screenshot of a flow. It almost always fails because it maps a task to fetch data from different sources, and one of them is down at the moment. The flow takes longer to complete not because of the volume of data to fetch. Indeed, it gathers data on a daily basis: every half-hour, it fetches all the data, accumulating throughout the day, for the current date, and it takes longer today at 1am (without much data then) than yesterday at 11pm.
When i redeploy the application, the flows gets back to the normal in terms of speed. I've done that this morning, which is why the bars are shorter at the end of the screenshot.
I am kinda lost, as we risk, without close monitoring, a failure of our whole environment. The issue makes me think of the Airflow's instable scheduler one. As a matter of fact, the fix I've thought of is to redeploy the application nightly, but I would by far better understand the root cause of the issue.
If you have any idea, I'd be delighted to make some further research with you, and maybe we could help Prefect become an even better product.
Alexis