Hey, If you needed to create a flow that did some ...
# show-us-what-you-got
p
Hey, If you needed to create a flow that did some logic in Python, and shell out a number of tasks and wait for them to return, then do more logic in python and ultimately create a set of final outputs which would be persisted, what would be the Prefect-ish way of doing this WRT communicating the locations of the files for each task? Is there a way to give each flow it's own dir in /tmp for example, and then refer to the files relatively, with each flow run looking in it's own /tmd/dir? Or do you use result caching and pass around the file locations that way? I'm probably missing something obvious. If anybody can point out an example of a flow which has this kind of logic it would be really helpful!
n
Hi @Philip MacMenamin - are you wanting to handle the read/write of the files yourself, or are you wanting Prefect to handle that?
p
For the binaries that I'm going to shell out to, I'd leave those binaries to handle everything. It would just be a dumb call into the CLI. There would be some processing of the files in Python also, and I would think it might be easier for me to handle the I/O on those files? I guess I'm not sure what the alternative would be.
I feel like my question is sort of betraying the fact that there's some aspect of Prefect I'm not 100% getting.
n
Well in this case you have 2 options I think: you can return the file paths from the tasks themselves (and pass to downstream), or use the results, targets, and checkpointing interfaces.
p
OK. I was looking at that area of the docs. So the first option you mention would be each task creates a a file in some arbitrary location which it takes care of, and I would return the fill path that file on each task.
*full
n
Exactly. And of course there's no reason you can't use a mix of these methods to suit your needs!
p
And the second option would be to use output caching along the lines of something like: https://docs.prefect.io/core/concepts/persistence.html#output-caching-based-on-a-file-target maybe And then return a full path at the end of that taks?
There's no way to just set a default location in a flow, eg and have everything in a run of that flow would create some uniq dir, eg /tmp/my_uniq_dir_0, and you can tell all tasks to operate using files where it's assumed that the default location will always be correct per the run?
By uniq dir, I mean unique to a a run. As in you can set up the flow to create a base dir /tmp/prefect_runs/uniq_run_id, and every task can default to using that special uniq dir that's dedicated to that run.
n
Ah gotcha, and then you'd want those to be persisted, correct?
p
I would want at least some of them persisted. I probably don't care about every output. The binaries I'm calling into might produce ancilliary files, or junk I don't care about. Some binaries produce outputs I don't care about persisting, some I will just need the paths to so I can feed it to the next binary, or do some operation on it in Python. I will need to persist files at the end of the run, I was going to use S3 or something for that.
n
Got it. In which case, I think templated locations are what you're looking for, which allow you to output results directly to run-specific directories. Since those can be configured at the task level, you can easily persist results you want and discard those you don't. Is that helpful?
p
OK, so those results are placed in a cloud bucket, do it will do a write across the network into that bucket, and then the next task will do a read to get the file local again. Is there a way to use the local file system during the run, and then pick a subset of files to persist in buckets?
in a way similar to this templated location. As in, just not use the cloud bucket during the run, and then ship selected outputs
n
You can use a local result to persist the initial results and then probably a downstream task to pick the ones you'd like to persist elsewhere.
And since the
prefect.context
is available in every task, you'd be able to persist the final results in the remote location using the same variables you would in a templated location
p
OK. I think this is what I'm looking for. Great.
Thanks Nicholas!
n
You're welcome! Let me know if you have any hiccups
p
will do.
from https://docs.prefect.io/core/concepts/results.html#how-to-configure-task-result-persistence
Copy code
from prefect import task, Flow
from prefect.engine.results import LocalResult


@task(result=LocalResult(location="initial_data.prefect"))
def root_task():
    return [1, 2, 3]

@task(result=LocalResult(location="{date:%A}/{task_name}.prefect"))
def downstream_task(x):
    return [i * 10 for i in x]

with Flow("local-results") as flow:
    downstream_task(root_task)
What should this snippet do?
(upon flow.run() getting called)
n
Hi @Philip MacMenamin - if you wouldn't mind, please open a new thread in the community channel
p
will do.