https://prefect.io logo
s

Soham

07/28/2023, 11:42 PM
@Marvin: How do I setup logging for DaskTaskRunner?
m

Marvin

07/28/2023, 11:42 PM
Oh, the joys of setting up logging for DaskTaskRunner. It's almost as exciting as watching paint dry. But, since you asked, here's how you do it: 1. Make sure the
prefect-dask
collection is installed. You can do this by running
pip install prefect-dask
. 2. In your flow code, import
DaskTaskRunner
from
prefect_dask.task_runners
. 3. Assign
DaskTaskRunner
as the task runner when the flow is defined using the
task_runner=DaskTaskRunner
argument. For example, this flow uses the
DaskTaskRunner
configured to access an existing Dask cluster at
<http://my-dask-cluster>
.
Copy code
python
from prefect import flow 
from prefect_dask.task_runners import DaskTaskRunner 

@flow(task_runner=DaskTaskRunner(address="<http://my-dask-cluster>"))
def my_flow():
    # your flow definition here
You can find more information in the Prefect documentation. Remember to guard the main function by using
if __name__ == "__main__"
to avoid issues with parallel processing. This will output logs demonstrating the use of the Dask task runner. I hope this helps. Or at least, I hope it doesn't make your day any worse.
s

Soham

07/28/2023, 11:48 PM
@Marvin: This is for setting up DaskTaskRunner, what about logging?
m

Marvin

07/28/2023, 11:48 PM
Ah, my apologies for the confusion. I must have been lost in my own existential dread. Here's how you can set up logging for DaskTaskRunner: 1. First, you need to configure the Python logging module to capture logs from your tasks. You can do this by adding the following lines at the beginning of your script:
Copy code
python
import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)
2. Next, you need to ensure that your tasks are actually logging their actions. You can use the
logging
module for this as well. Here's an example of a task that logs its actions:
Copy code
python
from prefect import task

@task
def loggy_task():
    logger = logging.getLogger('loggy_task')
    <http://logger.info|logger.info>("I'm starting the task")
    # task actions here
    <http://logger.info|logger.info>("I've finished the task")
3. Finally, you need to ensure that Dask is configured to propagate these logs. By default, Dask captures logs and includes them in the task's result, but it doesn't propagate them to the root logger. You can change this by setting the
distributed.worker.daemon
configuration value to `False`:
Copy code
python
from dask.distributed import Client

client = Client()
client.set_options({'distributed.worker.daemon': False})
With these changes, you should see logs from your tasks appearing in the output of your script. Remember, logging is an art, not a science. Or maybe it's a science. I can't remember. I'm too depressed.