I’m trying to find information on how I can expose...
# ask-community
s
I’m trying to find information on how I can expose the address of the Dask scheduler to use the distributed dashboard https://docs.dask.org/en/latest/diagnostics-distributed.html when using the Temporary Cluster approach described here https://docs.prefect.io/orchestration/flow_config/executors.html#using-a-temporary-cluster. The
DaskExecutor
is dynamically creating these temporary clusters https://github.com/PrefectHQ/prefect/blob/05cac2372c57a93ea72b05e7c844b1e115c01047/src/prefect/executors/dask.py#L213 on a per flow basis so I’m unsure how I can obtain the scheduler’s address (and dashboard link) without some hook here in the Prefect code? In an ideal world I would like to have the Dask scheduler’s public address and dashboard link reported as part of the Flow logs in Prefect UI. Any suggestions are greatly appreciated.
j
This is a bit tricky, since we can't be sure what the user-viewable address is based on the scheduler address (i.e. you might be viewing it through several layers of proxies). Usually we recommend dask users configure
distributed.dashboard.link
(https://docs.dask.org/en/latest/configuration-reference.html#distributed.dashboard.link) which can template that out. If you're fine setting that up properly, then I'd be happy to add a log line with the dashboard link during dask executor startup.
s
@Jim Crist-Harif I should be able to set the config properly in each Flow definition. One question on
host
information used here. I believe the scheduler.address will report the
host
from the private subnet range https://github.com/dask/distributed/pull/3429/files. For example, my scheduler logs show the private ip rather than the public ENI.
Copy code
distributed.scheduler - INFO -   Scheduler at:    <tcp://10.0.115.40:8786>
distributed.scheduler - INFO -   dashboard at:                     :8787
I guess this might be a deeper question on how to report the public ip for the container where the scheduler is running rather than it’s private ip.
j
Yeah, dask doesn't really have a way for configuring a visible public address for the scheduler (while we do have one for the dashboard). Rather than confusing users, would it be fine to just display the dashboard link for now?
s
👍 For sure. The dashboard is really my only concern. But in this case, if I use a scheme like
dask.config.set({"distributed.dashboard.link": "http://{host}:{port}/status"})
won’t my resulting dashboard_link be
<http://10.0.114.40:8786/status>
?
j
I'm not familiar enough with ECS to know. For most deployments where we've set this up (e.g. the pangeo k8s deployment) there's a statically known template that will work.
s
I guess the issue is that we can’t know the address of the scheduler until the cluster is created by the DaskExecutor. It seems like we’ll need dask-cloudprovider to expose the dashboard’s public ip as a property of https://github.com/dask/dask-cloudprovider/blob/d2072afbeba1c6cd42f5b6c60f9d9690352e9b3c/dask_cloudprovider/aws/ecs.py#L436 when it is available and you can publish this at creation time. Maybe I can try to ping Jacob and see what options we have for this.
👍 1
@Jim Crist-Harif Looking through
dask-cloudprovider
in more detail, it looks like much of this logic is already handled. https://github.com/dask/dask-cloudprovider/blob/main/dask_cloudprovider/aws/ecs.py#L190-L213 But given this, you should be able to use
dashboard_link
and assume the correct address. Where would this be logged? In the Flow log directly?
j
Great! I'm adding the log right now actually, should be out in next release.
🦜 1
s
🎊 Thanks so much for looking into it. You guys move fast 😄
@Jim Crist-Harif Is it possible to use the bot to capture this thread as a Github issue so we can use it for tracking? 🙏
j
Sure, but tracking for what exactly? I merged the log addition earlier, what outstanding work is there still?
s
👍 Ok. No worries then. I was going to ref the issue in
pangeo-forge
for context.
j
Sure, I can archive it for that purpose.
@Marvin archive "Display dask cluster info in prefect logs"
j
And the fix for the dashboard logs: https://github.com/PrefectHQ/prefect/pull/4321