David Elliott
08/03/2021, 6:32 PMprefect-job-xxxxx
would create 4 ephemeral dask workers (named something like dask-root-xxxx
)
• Now the behaviour I'm seeing is:
◦ K8s agent creates the prefect-job-xxx
◦ In the prefect-job
logs, it gives me _prefect.DaskExecutor | Creating a new Dask cluster with __main__.make_cluster
.Creating scheduler pod on cluster. This may take some time._
◦ there are then 5x dask-root-xxx pods
created, where 1 of them is a dask scheduler - ie the scheduler no-longer sits within the prefect-job-xx
? Just wanted to check if this was expected/intended behaviour - I couldn't see any reference to it in the prefect release notes
• In addition, (and this is more a side note that I think the prefect k8s rbac needs updating) - I've had to add 2x more rulesets to my k8s RBAC to make it work - see these docs for what's now required. Here is specifically what's changed vs the prefect docs
Thanks!David Elliott
08/03/2021, 6:32 PM0.14.19
--> 0.15.3
Dask: 2021.2.0
--> 2021.7.2
Distributed: 2020.12.0
--> 2021.7.2
dask-kubernetes: 0.11.0
--> 2021.3.1
David Elliott
08/03/2021, 6:32 PMdask-kubernetes
but their changelog is non-existentDavid Elliott
08/03/2021, 6:33 PMDavid Elliott
08/03/2021, 6:35 PMKevin Kho
Marvin
08/03/2021, 6:42 PMKevin Kho
Kevin Kho
David Elliott
08/03/2021, 8:10 PMpip install "prefect[aws,kubernetes]"==0.15.3
and it's installed distributed + dask version 2021.7.1
- I think it just takes latest atmKevin Kho
David Elliott
08/03/2021, 9:14 PM2021.1.0
they introduced this change, which causes this issue where prefect can't create an ephemeral pod due to a name attribute error
• that bug got fixed in dask-kubernetes 2021.3.0
(it handles the new name attribute properly), but that's also the version of dask-kubernetes which splits out the dask scheduler into its own pod (as I've described in my original post)
◦ so we have to keep dask-kubernetes pinned to 0.11.0
(the prior version) to keep the scheduler within the prefect job
• and the fix for the above change is to keep distributed pinned to 2020.12.0
, prior to the above name attribute change
• however in pinning distributed to 2020.12.0, we get a few dask compatibility issues (one is this but there are others) with newer versions of dask, meaning we have to pin dask to 2021.2.0
◦ (ie any version of dask > 2021.2.0
doesn't work with distributed 2020.12.0
)
So in summary, the latest working versions I've found which keep the scheduler in the prefect-job are as follows:
• prefect==0.15.3
• distributed==2020.12.0
• dask-kubernetes==0.11.0
• dask==2021.2.0
Have ran a test flow on this setup - the scheduler is still in the preject-job, and the pods get spawned properly, and the scheduler shuts down properly 👌Kevin Kho
David Elliott
08/03/2021, 9:30 PMprefect-job-xx
threw an error on shutdown - the flow still completed, but it wasn't a graceful shutdownBring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by