Hey all I ve been looking at upgrading my prefect version an Prefect Community #ask-community

Hey all, I've been looking at upgrading my prefect...

David Elliott

08/03/2021, 6:32 PM

Hey all, I've been looking at upgrading my prefect version and also my distrbuted, dask & dask-kubernetes versions for our production pipeline, and just wanted to clarify some change in behaviour that I've noticed? • Previously when I ran a flow, the k8s agent would create a job which was in effect the dask scheduler, creating and retiring pods as it needed to. In my case that

prefect-job-xxxxx

would create 4 ephemeral dask workers (named something like

dask-root-xxxx

) • Now the behaviour I'm seeing is: ◦ K8s agent creates the

prefect-job-xxx

◦ In the

prefect-job

logs, it gives me _prefect.DaskExecutor | Creating a new Dask cluster with

__main__.make_cluster

.Creating scheduler pod on cluster. This may take some time._ ◦ there are then 5x

dask-root-xxx pods

created, where 1 of them is a dask scheduler - ie the scheduler no-longer sits within the
prefect-job-xx
? Just wanted to check if this was expected/intended behaviour - I couldn't see any reference to it in the prefect release notes • In addition, (and this is more a side note that I think the prefect k8s rbac needs updating) - I've had to add 2x more rulesets to my k8s RBAC to make it work - see these docs for what's now required. Here is specifically what's changed vs the prefect docs Thanks!

David Elliott

08/03/2021, 6:32 PM

My versions have gone from -> to: Prefect:

0.14.19

-->

0.15.3

Dask:

2021.2.0

-->

2021.7.2

Distributed:

2020.12.0

-->

2021.7.2

dask-kubernetes:

0.11.0

-->

2021.3.1

David Elliott

08/03/2021, 6:32 PM

I think the main change has been in the upgrade of

dask-kubernetes

but their changelog is non-existent

David Elliott

08/03/2021, 6:33 PM

• I've mainly been looking at the git diff here • Line 263 of this PR I think also references the creation of a pod with a scheduler running, not sure if that explains this change though

David Elliott

08/03/2021, 6:35 PM

my runconfig is as follows if helpful.py

Kevin Kho

08/03/2021, 6:42 PM

@Marvin archive “Dask-kubernetes Upgrade to 2021.3.1 creates Dask Scheduler in New Pod”

Marvin

08/03/2021, 6:42 PM

https://github.com/PrefectHQ/prefect/issues/4838

Kevin Kho

08/03/2021, 6:45 PM

Hey @David Elliott, thanks for the detailed writeup. I don’t think anything on Prefect changed that would lead to this behavior. I agree with your thoughts that it seems to stem from dask-kubernetes being upgraded. With that said, I don;’t have any other advice than to downgrade for now if this is breaking. Also, I think our max version for distributed and dask is 2021.5.0 .

👍 1

Kevin Kho

08/03/2021, 6:46 PM

Does this break your setup in any way?

David Elliott

08/03/2021, 8:10 PM

Cool all good - really I just wanted to check if you had visibility over the change (or if it was intended) and also flag the RBAC change for the docs. I should be able to pin at a lower version for now - will give that a go and revert back if any issues! Also btw I just ran

pip install "prefect[aws,kubernetes]"==0.15.3

and it's installed distributed + dask version

2021.7.1

- I think it just takes latest atm

Kevin Kho

08/03/2021, 8:24 PM

Gotcha. Thanks for mentioning!

David Elliott

08/03/2021, 9:14 PM

OK so here's what I've found (mainly in case it's helpful for anyone else..!) • in distributed

2021.1.0

they introduced this change, which causes this issue where prefect can't create an ephemeral pod due to a name attribute error • that bug got fixed in dask-kubernetes

2021.3.0

(it handles the new name attribute properly), but that's also the version of dask-kubernetes which splits out the dask scheduler into its own pod (as I've described in my original post) ◦ so we have to keep dask-kubernetes pinned to

0.11.0

(the prior version) to keep the scheduler within the prefect job • and the fix for the above change is to keep distributed pinned to

2020.12.0

, prior to the above name attribute change • however in pinning distributed to 2020.12.0, we get a few dask compatibility issues (one is this but there are others) with newer versions of dask, meaning we have to pin dask to

2021.2.0

◦ (ie any version of dask >

2021.2.0

doesn't work with distributed

2020.12.0

) So in summary, the latest working versions I've found which keep the scheduler in the prefect-job are as follows: •

prefect==0.15.3

•

distributed==2020.12.0

•

dask-kubernetes==0.11.0

•

dask==2021.2.0

Have ran a test flow on this setup - the scheduler is still in the preject-job, and the pods get spawned properly, and the scheduler shuts down properly 👌

Kevin Kho

08/03/2021, 9:19 PM

Just wondering what issues having the scheduler out of the prefect-job brings? The pod won’t shut down? (worker pods also?)

David Elliott

08/03/2021, 9:30 PM

In fairness I've not extensively tested it, but I'm pretty sure the scheduler used to be separate and then it got moved into the prefect-job a few months back so it seemed like unexpected behaviour... This is minor, but from a debug perspective it's also much harder to find which pod is the scheduler vs the worker pods as they're all named the same - for me I have 4 workers, so trial + error looking into up to 5 pods is ok, but if you had tonnes of workers it'd be a real pain to find your scheduler Oh and yes you're right - when I did run it with the scheduler in its own pod, the

prefect-job-xx

threw an error on shutdown - the flow still completed, but it wasn't a graceful shutdown

👍 1

4 Views

Open in Slack

Previous Next