Does anyone have a best-practice work-around to ge...
# best-practices
Does anyone have a best-practice work-around to get dask distributed logs collected and displayed? 😇
How do you mean sorry? I am using a
and I am seeing logs produced in my tasks (provided a
is used to get an appropriate logger) in the prefect UI and flow run without any issues
I am using
as executor and the logs are not getting back from the workers.
I think
(on the local machine) works.
Ok, I am doing the same 🙂 For the most part my loads have always been captured fine, I have a self-hosted orion server running next to a postgres database, and this largely relieved my woes when a slurm cluster and database timeouts were involved. However, even then in some cases I have found my logs not making it back. Specifically, I had a python function that was calling
where a singularity command was issued. Said command would run an MPI application within the container, and an appropriate
would get the command to run across many compute nodes. This command produced a large set of outputs on
that I would capture through the normal
result object, and then manually
them. I found that these logs were not captured often, but everything ran successfully. Eventually I converged towards having a"sleep 5")
type command just after my my
command. When I did that my missing logs were always captured appropriately. My hunch is a simple one - the process managing the main orion server does not persist long enough to retrieve all outputs. This subprocess sleep command should block long enough for this exchange to carry out. At the time I worked through this I was not managing my own orion prefect server - I was relying on the main process to fire one up for the lifetime of the flow.
I think I omitted a crucial detail. I call
. I tested it without
and then the logs get properly through as you said.
When submitted the logs are even missing with a local Dask cluster 😕
(at least for me)