Hello all I have what I think is a very standard command lin Prefect Community #best-practices

Hello all. I have what I think is a very standard ...

Young Ho Shin

09/05/2022, 11:47 AM

Hello all. I have what I think is a very standard command-line workflow that I would like to implement with Prefect 2.0. Here's what I want to do: - Process many files with the same shell script in parallel - Track progress of overall processing and individual files on Orion UI - Overall progress: Number of current running/failed/completed tasks - Individual progress: View stdout/stderr logs of tasks that are complete, or are currently running - Optionally: Automatically retry failed files, or notify user Is this something that's well-supported in Prefect? Is there maybe a well-documented example somewhere that I can follow? Note: I've played around with [prefect-shell](https://github.com/PrefectHQ/prefect-shell), but I'm having trouble finding a straightforward way to view task logs for tasks that are currently running.

Anna Geller

09/05/2022, 12:51 PM

can't dive deeper since I'm afk but this might be helpful to you https://discourse.prefect.io/t/how-to-add-retries-when-processing-files-and-know-which-files-failed-to-get-processed/1201

Young Ho Shin

09/06/2022, 5:36 AM

@Anna Geller Hi Anna. Thanks for the reply. I'm specifically having trouble viewing the running logs while the tasks are still running. Running tasks concurrently seems to result in whole output to be logged after the whole task is complete. I've created a test repo to illustrate what I'm trying to do: https://github.com/yhshin11/prefect-test/tree/979ed69c7b1a250060a1d3b004f550f8b07e0409 My main issue is that Prefect receives the stdout logs of all the shell scripts only after all the tasks are complete (or have failed). While this works for many use cases, there are times when you want to monitor the output of the logs, just in case there are issues that may require you to restart the whole workflow.

Anna Geller

09/06/2022, 8:32 AM

Gotcha. Logs should appear immediately in the UI. Could you open a GitHub issue for that and explain in detail there what's missing to get the logs? If this happens only with shell tasks, it's worth opening an issue in the collection repo rather than the main prefect repo

Anna Geller

09/06/2022, 8:47 AM

I checked your repo now and it looks like you are indeed running each of those python scripts as async tasks, which is why they run till completion and then you get the logs -- the behavior is intended. Why are you trying to run this via shell scripts? calling it via functions would be easier and would allow you to use Prefect logger to get the logs directly

Young Ho Shin

09/06/2022, 10:25 AM

@Anna Geller Hi Anna. Thanks for the explanation. The reason I'm trying to force this contrived example is that

script.py

is meant to be a place holder for CLI tools written in C/java/etc that cannot be easily written in python. For example, in my current processing pipeline, I am processing PCAP network capture files, which can be processed in python, but there are existing tools that do the same job much faster. In any case, I think I've fixed a problem with my toy example. Using

print("xyz", flush=True)

forced my python script to write to stdout every second, which is then handled correctly by`anyio`. Now I get output like this, which is what I wanted:

Copy code

19:17:06.027 | INFO    | Task run 'shell_run_command-7398b6ba-3' - Been running task for 0 seconds... Repeating message: meow

19:17:06.257 | INFO    | Task run 'shell_run_command-7398b6ba-2' - Been running task for 0 seconds... Repeating message: quack

19:17:06.447 | INFO    | Task run 'shell_run_command-7398b6ba-0' - Been running task for 0 seconds... Repeating message: moo

19:17:07.028 | INFO    | Task run 'shell_run_command-7398b6ba-3' - Been running task for 1 seconds... Repeating message: meow

19:17:07.067 | INFO    | Task run 'shell_run_command-7398b6ba-1' - Been running task for 0 seconds... Repeating message: woof

19:17:07.258 | INFO    | Task run 'shell_run_command-7398b6ba-2' - Been running task for 1 seconds... Repeating message: quack

👍 1

🙏 1

Young Ho Shin

09/06/2022, 10:28 AM

This demo is not quite fully where I want it to be, namely because the log outputs do not get collected when I run the flow with

DaskTaskRunner

instead of the default task runner, but maybe that's related to open issues currently being worked on: https://github.com/PrefectHQ/prefect/issues/6022 https://github.com/PrefectHQ/prefect/issues/5850

Anna Geller

09/10/2022, 12:01 AM

just to say: nice work with the workaround you found and I'd recommend tracking the progress of the issues. Thanks for nice write up of the issue and the solution you figured out

8 Views

Open in Slack

Previous Next