Hi Prefect team amp community I posted this question over on Prefect Community #ask-community

Hi Prefect team & community — I posted this qu...

Hans Lellelid

05/21/2024, 5:32 PM

Hi Prefect team & community — I posted this question over on Discourse last week (https://discourse.prefect.io/t/best-strategy-for-long-running-consumer-jobs/4130) but since no answers there, though there might be a more conversation way to enquire about how people are handling long-running processes. Specifically we’re ingesting data using a Dask executor and we want to “keep ingesting it” as long as there is data available on the server. Conceptually what we really want is a service that loops forever checking for new data and is resilient to crashes. Here are some of the problems we’re seeing: • A worker might die/crash for some ephemeral reason (e.g. right now seeing errors related to trying to retrieve task result state and claiming it is no longer available) and Prefect-Server doesn’t know that the worker was restarted by the orchestration platform (Docker Swarm in our tests). • When workers go away and are restarted, the slots are not freed up. So jobs just end in a stuck state. Is there a way to have these auto-cleaned-up somehow? • If a task loops indefinitely waiting for data, we don’t have a good mechanism to restart it if it fails for some reason (again, there appear to be many things that can cause this). • If we run tasks on a schedule to ensure that something is always running, we typically end up with tasks trying to step on eachother and so then have to control these with concurrency slots. But then if something crashes, the slot isn’t freed up. Making this also an insufficient strategy. Maybe “just don’t use Prefect for this” is the right answer here? (Our current plan is to pull this out of Prefect and into standalone services, but I figured I would also ask here, as perhaps other people are using Prefect for things like this?)

4 Views

Open in Slack

Previous Next