I now have a working Prefect Server with flows running withi Prefect Community #prefect-server

I now have a working Prefect Server with flows run...

Pierre Monico

08/25/2021, 9:04 AM

I now have a working Prefect Server with flows running within a

DockerRun

run config. However, since I deployed to the server the runs are not very reliable. I keep getting random errors in a totally non-deterministic way (cf. image). The flows all run fine on my machine and were also running fine when executed in a Docker container I was managing myself (on cloud infrastructure). Some of the errors include: •

HTTPConnectionPool(host='host.docker.internal', port=4200): Read timed out. (read timeout=15)

•

sqlalchemy.exc.NoSuchTableError

(I am writing to Postgres tables - the table does exist and sometimes the run even succeeds) •

TypeError: 'NoneType' object is not subscriptable

•

No heartbeat detected from the remote task; retrying the run.This will be retry 1 of 3.

I believe most of the errors are related to some time out issue / the state of the task can’t be monitored but I am confused as to why this happens. My VM has sufficient resources (I checked the monitoring) but I am thinking it might be worth scaling it up? I know the question might be a bit broad but would be happy to know if anyone experienced something similar and / or knows the reasons / a fix.

Kevin Kho

08/25/2021, 2:48 PM

I think some of these might be related to scaling up. Are you accessing

sqlalchemy

with a mapped task by chance? Also the

NoneType

error normally happens when something didn’t succeed and the flow continues. Heartbeat issues are related to memory issues 90% of the time. Heartbeats tell Prefect that the task is still running and from most of the cases we’ve seen, heartbeats die when the flow/task is memory constrained. All things considered, I think the best advice for you is to scale up here.

Pierre Monico

08/25/2021, 3:07 PM

Thanks a lot for the detailed advice. I’ll look into scaling the machine and will see if it fixes it.

Pierre Monico

08/25/2021, 3:54 PM

Update: checking monitoring again it was indeed a problem with the VM resources. For a start I did spread my flow schedules over several periods (instead of all at the same time) and it seems to already help.

👍 1

Pierre Monico

08/25/2021, 9:39 PM

Indeed, just spreading the flow schedules did wonders 😍

👍 1

Pierre Monico

08/26/2021, 8:10 AM

Thanks again for always helping out @Kevin Kho!

Kevin Kho

08/26/2021, 1:58 PM

Of course!

4 Views

Open in Slack

Previous Next