Hello all. I have what I think is a very standard ...
# best-practices
y
Hello all. I have what I think is a very standard command-line workflow that I would like to implement with Prefect 2.0. Here's what I want to do: - Process many files with the same shell script in parallel - Track progress of overall processing and individual files on Orion UI - Overall progress: Number of current running/failed/completed tasks - Individual progress: View stdout/stderr logs of tasks that are complete, or are currently running - Optionally: Automatically retry failed files, or notify user Is this something that's well-supported in Prefect? Is there maybe a well-documented example somewhere that I can follow? Note: I've played around with [prefect-shell](https://github.com/PrefectHQ/prefect-shell), but I'm having trouble finding a straightforward way to view task logs for tasks that are currently running.
a
y
@Anna Geller Hi Anna. Thanks for the reply. I'm specifically having trouble viewing the running logs while the tasks are still running. Running tasks concurrently seems to result in whole output to be logged after the whole task is complete. I've created a test repo to illustrate what I'm trying to do: https://github.com/yhshin11/prefect-test/tree/979ed69c7b1a250060a1d3b004f550f8b07e0409 My main issue is that Prefect receives the stdout logs of all the shell scripts only after all the tasks are complete (or have failed). While this works for many use cases, there are times when you want to monitor the output of the logs, just in case there are issues that may require you to restart the whole workflow.
a
Gotcha. Logs should appear immediately in the UI. Could you open a GitHub issue for that and explain in detail there what's missing to get the logs? If this happens only with shell tasks, it's worth opening an issue in the collection repo rather than the main prefect repo
I checked your repo now and it looks like you are indeed running each of those python scripts as async tasks, which is why they run till completion and then you get the logs -- the behavior is intended. Why are you trying to run this via shell scripts? calling it via functions would be easier and would allow you to use Prefect logger to get the logs directly
y
@Anna Geller Hi Anna. Thanks for the explanation. The reason I'm trying to force this contrived example is that
script.py
is meant to be a place holder for CLI tools written in C/java/etc that cannot be easily written in python. For example, in my current processing pipeline, I am processing PCAP network capture files, which can be processed in python, but there are existing tools that do the same job much faster. In any case, I think I've fixed a problem with my toy example. Using
print("xyz", flush=True)
forced my python script to write to stdout every second, which is then handled correctly by`anyio`. Now I get output like this, which is what I wanted:
Copy code
19:17:06.027 | INFO    | Task run 'shell_run_command-7398b6ba-3' - Been running task for 0 seconds... Repeating message: meow

19:17:06.257 | INFO    | Task run 'shell_run_command-7398b6ba-2' - Been running task for 0 seconds... Repeating message: quack

19:17:06.447 | INFO    | Task run 'shell_run_command-7398b6ba-0' - Been running task for 0 seconds... Repeating message: moo

19:17:07.028 | INFO    | Task run 'shell_run_command-7398b6ba-3' - Been running task for 1 seconds... Repeating message: meow

19:17:07.067 | INFO    | Task run 'shell_run_command-7398b6ba-1' - Been running task for 0 seconds... Repeating message: woof

19:17:07.258 | INFO    | Task run 'shell_run_command-7398b6ba-2' - Been running task for 1 seconds... Repeating message: quack
🙏 1
👍 1
This demo is not quite fully where I want it to be, namely because the log outputs do not get collected when I run the flow with
DaskTaskRunner
instead of the default task runner, but maybe that's related to open issues currently being worked on: https://github.com/PrefectHQ/prefect/issues/6022 https://github.com/PrefectHQ/prefect/issues/5850
a
just to say: nice work with the workaround you found and I'd recommend tracking the progress of the issues. Thanks for nice write up of the issue and the solution you figured out