James Morley
04/11/2024, 8:32 AMDaskTaskRunner
and am finding that if there are issues with workers on the cluster (e.g. if something is wrong with the cluster configuration and the workers crash) then Prefect tasks just hang forever with pending status. I even see an ERROR log message in the console from Dask:
2024-04-10 13:17:00,323 - distributed.scheduler - ERROR - Task parse_raw_data-0-cb42b0a8543b470ca6699484874854ac-1 marked as failed because 4 workers died while trying to run it
What's the recommended way of dealing with this? Ideally the error would propagate to Prefect and the program would terminate.