Hey all! I'm seeing some strange behavior. I've de...
# ask-community
j
Hey all! I'm seeing some strange behavior. I've deployed self-hosted prefect server to ECS. And I'm using an ECS worker. Here is the strangeness: I have a parent flow that spawns many subflows. If a subflow fails, it propagates up and the parent flow fails BUT the sibling flows don't fail even though the infrastructure that was running them has shut down. AND then remain in a running state forever (or until I manually delete them). Can someone illuminate me on what I'm seeing?
n
hi @Jonathan Samples are those subflows on the same infra as the parent? or are they subflows but triggered via
run_deployment
? mostly commonly zombie flow runs are because the infra OOM'd and it can no longer report state, so prefect API never gets an update
j
Thanks for the reply @Nate! Yes, those subflows are on the same infra... But its not an OOM error. Actually, I think the flow infra couldn't communicate to the prefect server (503)
But maybe a 503 to prefect server acts like an OOM in this case?
n
hmm. I'd think the client would try again on a 503, or least by default it ought to
j
Is there any way of accounting for these zombie flows?
Does prefect server implement any kind of timeout for hearing back from flows?
n