<@ULVA73B9P>prefect job failing due to error Finis...
# ask-marvin
a
@Marvinprefect job failing due to error Finished in state Failed("Task run encountered an exception KilledWorker: Attempted to run task ('read-parquet-1d5350ce0b6fc72ffd2a614b000c8118', 0) on 3 different workers, but all those workers
m
It seems your problem could be related to the
KilledWorker
exception in Dask, which occurs when a Dask worker dies multiple times while trying to run a task, leading to the failure of your
Prefect
job. Here are a few relevant insights and resources: 1. Docs Insight: The infrastructure PID needs to be highly specific to manage flow execution environments effectively. Ensure your infrastructure configuration and debugging mechanisms can uniquely handle and identify such scenarios. - Source: Prefect Workers Guide 2. GitHub Insights: - KilledWorker Issue: Prefect does not handle
KilledWorker
exceptions effectively. Ideally, such tasks should be marked as "Crashed" instead of "Failed". Consider examining worker logs for more insights. - Issue & Solution: GitHub Issue - Retries and Worker Stability: When using Dask with autoscalers (like in Kubernetes with GPU-backed instances), tasks may fail if workers are preempted. Consider implementing retries both for tasks and worker creation. - Related Issue: GitHub Issue 3. General Recommendations: - Ensure Scalability and Resilience: Deploy a robust architecture that can handle task failures and worker preemptions gracefully. Implement effective monitoring and logging to diagnose issues. - Related Reading: Prefect Blog If you need more detailed guidance, let me know.