https://prefect.io logo
s

Samuel Hinton

07/20/2023, 1:15 AM
Hi all! Does anyone know how to get any information from Prefect about why a flow might have crashed? I can see no logs of things going wrong in the dask worker, scheduler, agent, or the orion service itself, and the Flow is reporting "Crashed" while a task in the flow is just forever stuck in "Running"
n

Nate

07/20/2023, 3:07 AM
hey @Samuel Hinton - what is your runtime infra like? i.e. where is your agent/worker submitting flow runs?
s

Samuel Hinton

07/20/2023, 3:30 AM
Hey Nate! Its hosted in AWS running as services in an ECS 🙂 We're using a dask task runner, if that's what you were getting at?
n

Nate

07/20/2023, 7:30 PM
i just mean that crashed is very often an infrastructure problem, so I would think that in addition to
no logs of things going wrong in the dask worker, scheduler, agent, or the orion service itself
I would check your ECS containers (I'm not an ECS buff so Im not exactly sure what that looks like), since if AWS had some issue or if the container OOMd then it seems like something like you described could happen
s

Samuel Hinton

07/21/2023, 1:24 AM
Ill check in with infra then, thanks Nate. Just wanted to make sure I wasn't missing some obvious button or tab that might have held some other logs 🙂
n

Nate

07/21/2023, 2:10 AM
👍