Tim-Oliver
11/14/2022, 1:57 PMTim Galvin
11/14/2022, 2:00 PMDaskTaskExecutor and I am seeing logs produced in my tasks (provided a prefect.get_run_logger is used to get an appropriate logger) in the prefect UI and flow run without any issuesTim-Oliver
11/14/2022, 2:12 PMdask_jobqueue.SLURMCluster as executor and the logs are not getting back from the workers.Tim-Oliver
11/14/2022, 2:13 PMDaskTaskExecutor (on the local machine) works.Tim Galvin
11/14/2022, 3:01 PMsubprocess.run where a singularity command was issued. Said command would run an MPI application within the container, and an appropriate srun would get the command to run across many compute nodes. This command produced a large set of outputs on stdout that I would capture through the normal subprocess result object, and then manually <http://logger.info|logger.info> them. I found that these logs were not captured often, but everything ran successfully.
Eventually I converged towards having a subprocess.run("sleep 5") type command just after my my <http://logger.info|logger.info> command. When I did that my missing logs were always captured appropriately. My hunch is a simple one - the process managing the main orion server does not persist long enough to retrieve all outputs. This subprocess sleep command should block long enough for this exchange to carry out. At the time I worked through this I was not managing my own orion prefect server - I was relying on the main process to fire one up for the lifetime of the flow.Tim-Oliver
11/14/2022, 4:19 PMa_task.submit() or a_task.map(some_list) . I tested it without submit and then the logs get properly through as you said.Tim-Oliver
11/14/2022, 4:20 PMTim-Oliver
11/14/2022, 4:20 PM