Pierre Monico
08/25/2021, 9:04 AMDockerRun
run config. However, since I deployed to the server the runs are not very reliable. I keep getting random errors in a totally non-deterministic way (cf. image).
The flows all run fine on my machine and were also running fine when executed in a Docker container I was managing myself (on cloud infrastructure). Some of the errors include:
• HTTPConnectionPool(host='host.docker.internal', port=4200): Read timed out. (read timeout=15)
• sqlalchemy.exc.NoSuchTableError
(I am writing to Postgres tables - the table does exist and sometimes the run even succeeds)
• TypeError: 'NoneType' object is not subscriptable
• No heartbeat detected from the remote task; retrying the run.This will be retry 1 of 3.
I believe most of the errors are related to some time out issue / the state of the task can’t be monitored but I am confused as to why this happens. My VM has sufficient resources (I checked the monitoring) but I am thinking it might be worth scaling it up?
I know the question might be a bit broad but would be happy to know if anyone experienced something similar and / or knows the reasons / a fix.Kevin Kho
sqlalchemy
with a mapped task by chance? Also the NoneType
error normally happens when something didn’t succeed and the flow continues.
Heartbeat issues are related to memory issues 90% of the time. Heartbeats tell Prefect that the task is still running and from most of the cases we’ve seen, heartbeats die when the flow/task is memory constrained.
All things considered, I think the best advice for you is to scale up here.Pierre Monico
08/25/2021, 3:07 PMPierre Monico
08/25/2021, 3:54 PMPierre Monico
08/25/2021, 9:39 PMPierre Monico
08/26/2021, 8:10 AMKevin Kho