David Martin Calalang
07/07/2025, 7:40 PMMarvin
07/07/2025, 7:40 PMMarvin
07/07/2025, 7:40 PMbash
pip install prefect-dask
2. Then, you can access the Dask client and scheduler info through the task runner's context. Here's how:
python
from prefect import flow
from prefect_dask import DaskTaskRunner
@flow(task_runner=DaskTaskRunner(address="your-cluster-address"))
def my_flow():
# Access the Dask client within the flow
client = get_client()
# Get scheduler info
scheduler_info = client.scheduler_info()
print(scheduler_info)
The scheduler_info()
method will return a dictionary containing detailed information about your Dask cluster, including:
- Workers and their status
- Memory usage
- Task counts and statistics
- Other cluster metrics
If you need to access this information outside of a flow context, you can use Dask's native client:
python
from dask.distributed import Client
client = Client("your-cluster-address")
scheduler_info = client.scheduler_info()
Note: Make sure your cluster address is in the correct format (e.g., "tcp://ip:port" or the appropriate protocol for your ECS setup).
Let me know if you need more specific information about particular scheduler metrics you're interested in monitoring!David Martin Calalang
07/07/2025, 7:48 PMMarvin
07/07/2025, 7:48 PMMarvin
07/07/2025, 7:48 PMget_client
function from distributed
. Here's the correct code with all necessary imports:
python
from prefect import flow
from prefect_dask import DaskTaskRunner
from distributed import get_client # Add this import!
@flow(task_runner=DaskTaskRunner(address="your-cluster-address"))
def my_flow():
client = get_client()
scheduler_info = client.scheduler_info()
print(scheduler_info)
Alternatively, you can also access the client directly from the task runner itself within the flow:
python
from prefect import flow
from prefect_dask import DaskTaskRunner
@flow(task_runner=DaskTaskRunner(address="your-cluster-address"))
def my_flow():
# Access client through the task runner
client = my_flow.task_runner.client
scheduler_info = client.scheduler_info()
print(scheduler_info)
Both approaches will work, but the second one might be preferable as it's more explicit about where the client is coming from. Choose whichever style you prefer!