I want to use Prefect to stitch together a number ...
# show-us-what-you-got
p
I want to use Prefect to stitch together a number of python functions and calls to executables for a satellite image processing workflow. This will all be done locally on a single machine. Some tasks will depend on files written by previous tasks. Will Prefect work for this use case? In the examples, it looks like data is passed in memory among the tasks and there aren't dependencies based on the existence of certain files.
j
Hi @Philip Blankenau, yep! There are a few ways to solve this. One straightforward approach is to write the file in one task and return the resulting
path
as the task’s output; then the downstream task accepts
path
as an argument and uses it to load the data. The important thing is that even though Prefect has facilities for exchanging data between tasks, it is agnostic to what that data is (or if you pass any data at all!) In addition, the next Prefect release will include a refactored
Result
class that will make some of the details of serializing / deserializing data to disk easier (whether local, cloud, or anywhere)
upvote 1
p
Right now I am using the subprocess module to call the executable, which has failed and the task was still listed as
SUCCESS. Would using the ShellTask class help failures to properly register as failures?
j
Good question - yes, I believe it would. The ShellTask examines the return code of the process and if it’s non-zero, raises an error. However, if you want to continue using subprocess, you can - you just need to raise the error yourself because I believe unless you do
subprocess.check_call()
subprocess doesn’t raise an error on failure. Prefect will trap the error and interpret the task as failing; if the task function runs to completion Prefect treats it as a success.
p
One more question I would pose is what is the utility of using prefect for this usecase besides easy parallelism (which is very nice btw)? Other workflow managers will look at when a file was was last touched to figure out where the flow needs to resumed but since prefect doesn't have file dependencies it runs all the tasks from the very beginning. This means error recovery is nil with this kind of work flow.
j
Prefect achieves the same purpose by persisting workflow states to a database; the advantage of that approach is that Prefect tasks do not have to all have access to a common filesystem. In addition, Prefect tasks do not have to produce any artifact (like a file) to still receive all the benefits of workflow management (success, failure, retry, etc.) However, the upcoming release will add file-based checkpointing (called
target
, similar to Luigi) in order to help users who do want artifact-based states.
p
Excellent. Any ETA on that release?
j
Yup - should be available in the next week or two!
p
To run a flow on a schedule locally on a Windows machine would you just leave a the python process running in the background?
j
Yes, you can call
flow.run()
and let it run.