Hey everyone, I had a flow that ran 12 hours late…...
# ask-community
c
Hey everyone, I had a flow that ran 12 hours late… I’ve never had a flow run late at all so seeing it run 12 hours late is a bit crazy to me. What are the cause of late run like this?
b
Hi Christian, from my experience, flow runs generally get stuck in a "Late" status when the worker/agent responsible for spinning up the infra for the flow run is not actively polling for new work
c
Thanks for the response, Bianca. Is there a reason that the agent is not polling for new work? I’ve just updated the database to use PostgreSQL instead of the standard SQLite so we’ll see if this helps. I was getting a lot of “database is locked” errors with SQLite.
b
Usually when the agent stops polling for work, some problem has occurred which caused the process to exit
Like the infrastructure or machine you're using to host the agent crashing or being spun down
The best way to mitigate this is to daemonize the agent in some way, to ensure that the lights are kept on. Here's a few examples for daemonization: • How to run a Prefect 2 worker as a systemd service on LinuxDaemonizing the Agent with Docker
I think the agent logs would help clarify why the flow run ran so late. Agent logs are sent to stdout by default, so you may be able to pinpoint what happened by looking at the timestamps
I was getting a lot of “database is locked” errors with SQLite.
Ah, gotcha. I'm not entirely familiar with that error, but I think it may be related to issues with multi-threading. IE: one threads locks the database, and another thread will be blocked from writing
FWIW, Prefect Cloud is always available as an alternative to self hosting. If it gets to be a bit troublesome to host the server and database, Cloud removes the onus of having to manage thoes resources yourself. ☁️
c
I didn’t think about checking the logs. I can do that too next time. Everything seems to be working fine now, might have just been a fluke.