Hi all! Does anyone know how to get any information from Prefect about why a flow might have crashed?
I can see no logs of things going wrong in the dask worker, scheduler, agent, or the orion service itself, and the Flow is reporting "Crashed" while a task in the flow is just forever stuck in "Running"
n
Nate
07/20/2023, 3:07 AM
hey @Samuel Hinton - what is your runtime infra like? i.e. where is your agent/worker submitting flow runs?
s
Samuel Hinton
07/20/2023, 3:30 AM
Hey Nate! Its hosted in AWS running as services in an ECS 🙂 We're using a dask task runner, if that's what you were getting at?
n
Nate
07/20/2023, 7:30 PM
i just mean that crashed is very often an infrastructure problem, so I would think that in addition to
no logs of things going wrong in the dask worker, scheduler, agent, or the orion service itself
I would check your ECS containers (I'm not an ECS buff so Im not exactly sure what that looks like), since if AWS had some issue or if the container OOMd then it seems like something like you described could happen
s
Samuel Hinton
07/21/2023, 1:24 AM
Ill check in with infra then, thanks Nate. Just wanted to make sure I wasn't missing some obvious button or tab that might have held some other logs 🙂
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.