Tadej Svetina

02/09/2023, 9:31 AM
Hi, I've been having a lot of crazy reliability issues with Prefect lately. As soon as I ran >5 flows on an agent, it starts crashing. The flows are not doing much, polling an API and waiting in between (min 20s intervals) - and it's all using ConcurrentTaskRunner, all functions are async. Why is this happening? Any configuration I can tweak to make this run better? If not, I'll have to move off Prefect, reliability here really is the most important feature
I'm running the agent on a very small server, and the CPU usage is consistently low (<5%). Issue might be with memory? How much memory does each new flow occupy?
Also, UI really feels buggy: there's a lot of inconsistency between the state of the Flow show in FLow runs and the actual state (that I can see when I go to the Flow run's page), this persists even across refreshes. And what I also notice frequently is that if an agent crashes before picking up a flow (when a Flow is pending), the flow will not be picked up by the agent when it comes back online, unless I manually cancel it and then retry.
Another thing: on a running agent, some sub flows will just crash silently - they will still show up as running, but logs will stop appearing. such runs are then also really hard to cancel