https://prefect.io logo
g

Grant

08/02/2023, 6:22 PM
Posing this question here. I switched to using prefect workers and not prefect agent anymore, but I'm still unsure of why the files being generated and stored in the directory .prefect/storage are so large? Each file name looks like an id and the files are being generated multiple times a minute, with some files being as large as 300-400 mb each.
For context, I'm using a "Local Subprocess" work pool. Here is an example of some of the files that are being generated and stored. What are these files?
t

Tim-Oliver

08/03/2023, 6:17 AM
I think these are the results generated by your tasks. If one of your tasks returns a numpy array it will be converted into a binary blob and stored in your local storage (if persisting results is triggered in any way). By default local storage is in your home directory. You can either change the prefect home directory with the environment variable
PREFECT_HOME
or set the storage location for your flow directly by providing result_storage in your flow-annotation or setting the environment variable
PREFECT_LOCAL_STORAGE_PATH
. You can further organize them into directories by providing
result_storage_key="{flow_run.name}/{task_run.task_name}/{task_run.name}.json"
, which will dynamically fill in the values in
{}
.
🙌 1
g

Grant

08/03/2023, 1:23 PM
Thank you @Tim-Oliver!! Jake Kaplan pointed out in another thread the documentation that lists scenarios where my results will be persisted and it seems like my results are persisting quite often. If I want to store results in a github repo for example, how would I configure that as my result_storage? I tried looking through the docs to find that but only found how to use a github storage block to store flow code, not results.
t

Tim-Oliver

08/03/2023, 1:25 PM
I don't think this is possible. Maybe with a custom serializer taking care of moving stuff to github.