Hi everyone, There is an issue where the first ta...
# ask-community
r
Hi everyone, There is an issue where the first task in the flow is re-running before finishing. This happens 3-4 times and then the run fails with
KilledWorker
error. It keeps restarting the first task. Do you know what the issue could be?
Screenshots: The first task starts to run - I can see the logs. But half way through, it starts again. No errors there. This happens 4 times and then I see "KilledWorker".
k
Hey @Riley Hun, from experience, it’s a Dask specific issue and a vague one at that. But I have experienced it the most when I have mismatched versions between client, scheduler, and worker Python libraries (but mainly scheduler and worker).
r
Thanks @Kevin Kho, this is helpful! Will check to see if the versions are consistent.
k
But here I guess memory might be an issue? It looks like this cluster is already running right? How is your memory utilization in the UI?
r
I believe a private dask cluster through Dask Gateway is being used. It's 12 to 20 workers. It's on a private VPC too which prevents us from viewing the UI.
k
In general though, the KilledWorker literally means a worker died for some reason and couldn’t be recovered (so could be memory or hanging or something)
r
Okay makes sense. Thanks Kevin!
k