What are we supposed to do when flows or tasks are stuck in a non-terminal state? Like this is blows...

Mitch

10/22/2024, 9:31 PM

What are we supposed to do when flows or tasks are stuck in a non-terminal state? Like this is blows up the amount of data in the database since it keeps writing a states even though the runs are from 2 months ago and there's literally no infra for them anymore. Example: Flow spins up tasks, flow crashes due to memory. Tasks show as still running in dashboard, and database still shows running state continually being written...

Nate

10/22/2024, 9:36 PM

hi @Mitch - one thing that comes to mind would be proactive automations that check that flows runs that start, end up emitting a terminal state event in some reasonable period of time if that triggers, you can do a "set flow run state" action to force the run to Failed or Crashed

Mitch

10/22/2024, 9:42 PM

Would an on_crashed hook also work? We enountered an issue earlier this month where we had 1 billion records in a state table where >99% of them are caused by states being continually written. This is also considered a base case where spot instance would be used.

Nate

10/22/2024, 9:43 PM

well on_crashed hooks would run on the infra that might disappear right? ie you can’t guarantee that runs if you OOM or something

Mitch

10/22/2024, 9:44 PM

Ok, thanks @Nate. I guess some form of a cron job would be the solution then 🙏

Mitch

10/22/2024, 9:45 PM

It does seems hard to implement logic to distinguish which state should actually be transitioned or not despite the parent failing or crashing for flows, as some jobs may actually need to continue or be operating on inra properly

70 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.