Does anyone have a best-practice work-around to ge...
# best-practices
t
Does anyone have a best-practice work-around to get dask distributed logs collected and displayed? 😇
t
How do you mean sorry? I am using a
DaskTaskExecutor
and I am seeing logs produced in my tasks (provided a
prefect.get_run_logger
is used to get an appropriate logger) in the prefect UI and flow run without any issues
t
I am using
dask_jobqueue.SLURMCluster
as executor and the logs are not getting back from the workers.
I think
DaskTaskExecutor
(on the local machine) works.
t
Ok, I am doing the same 🙂 For the most part my loads have always been captured fine, I have a self-hosted orion server running next to a postgres database, and this largely relieved my woes when a slurm cluster and database timeouts were involved. However, even then in some cases I have found my logs not making it back. Specifically, I had a python function that was calling
subprocess.run
where a singularity command was issued. Said command would run an MPI application within the container, and an appropriate
srun
would get the command to run across many compute nodes. This command produced a large set of outputs on
stdout
that I would capture through the normal
subprocess
result object, and then manually
<http://logger.info|logger.info>
them. I found that these logs were not captured often, but everything ran successfully. Eventually I converged towards having a
subprocess.run("sleep 5")
type command just after my my
<http://logger.info|logger.info>
command. When I did that my missing logs were always captured appropriately. My hunch is a simple one - the process managing the main orion server does not persist long enough to retrieve all outputs. This subprocess sleep command should block long enough for this exchange to carry out. At the time I worked through this I was not managing my own orion prefect server - I was relying on the main process to fire one up for the lifetime of the flow.
t
I think I omitted a crucial detail. I call
a_task.submit()
or
a_task.map(some_list)
. I tested it without
submit
and then the logs get properly through as you said.
When submitted the logs are even missing with a local Dask cluster 😕
(at least for me)