:wave: What is the best advise for cleaning up *`p...
# prefect-server
n
👋 What is the best advise for cleaning up
prefect/flows
so that we can keep it at a manageable size (it tends to just grow naturally; and it does so quite quickly)?
🤔
z
Hi! I’m checking in with the rest of the team about this 🙂
n
Thank you. As i mention in another thready “I don’t know what’s safe to delete. This is not 
--volume-path
 but instead it’s the task 
flow.storage = Local(directory='/this/location)
” Also it has grown to 100GB for less than 30 days.
d
Hi @Newskooler is that the same location where you’re storing Flow Results?
n
I am not sure I get the question @Dylan . So in my mind there are two locations : volume path and the flow.storage. I am saying I have issues with the flow.storage growing a lot. I guess this is the flow results, right?
z
flow.storage
is where the pickled flow is stored (when using
Local
) — the code of your flow
n
So why does this grow so big?
z
When using
Local
storage, the flow is simply pickled with
cloudpickle
and dumped as bytes — do you have a ton of files there? What’s the smallest/largest file?
n
I have a flow which runs every minute and that’s the one causing the issue
I do. I will check shortly and tell you. I did delete some of them to get the server running again haha
z
How often are you registering the flow?
n
I register it about 10 times ; but don’t need to register it anymore
For the last 30 days that is
z
Hmm. This seems a bit peculiar. Each time it is registered, the storage is “built” which means the flow is written to the directory as “flow-name.prefect”
It should even be overwritten on subsequent registrations as far as I can tell
n
so I have 100k files inside this dir (for the last 10 days). They look like so:
Copy code
-rw-r--r-- 1 root root  1436873 Dec 10 17:24 prefect-result-2020-12-10t17-24-10-287987-00-00
-rw-r--r-- 1 root root  1179386 Dec 10 17:24 prefect-result-2020-12-10t17-24-10-629558-00-00
-rw-r--r-- 1 root root    13969 Dec 10 17:24 prefect-result-2020-12-10t17-24-15-435020-00-00
-rw-r--r-- 1 root root    16465 Dec 10 17:24 prefect-result-2020-12-10t17-24-15-821351-00-00
-rw-r--r-- 1 root root       94 Dec 10 17:24 prefect-result-2020-12-10t17-24-16-142481-00-00
This is the largest:
Copy code
67836	./prefect-result-2020-12-08t18-57-57-583026-00-00
and this is the smallest:
Copy code
4	./prefect-result-2020-11-30t00-00-15-595208-00-00
For reference they are orders of magnitudes larger than the data they have processed.
z
So looking into this a bit more and to clarify: if a task has checkpointing enabled (which is the default), it has a result which is written somewhere. The default result type for task is the result type for the flow which defaults to the result type associated with the flow storage you are using. In the
Local
storage class, the associated result type is defined as
result = LocalResult(self.directory, validate_dir=validate)
— this means that this directory will contain both your serialized flow and all of your flow run results.
n
Okay, that’s by design, right and what does this mean in terms of what makes sense for me to do going forward?
z
Anything with
prefect-result-*
can be safely deleted if you do not need that data anymore. Furthermore, you can disable checkpointing for tasks if you don’t need the ability to resume.
n
The resume is a super cool feature. I can set up a cron job to delete (only
prefect-result-*
) older than X days, does this make sense?
z
Yeah that makes sense.
You could even make a flow 😄
n
True that. 😄 Thanks for the help!
On your end - do you see that as an issue which needs addressing or is that an expected behaviour?
z
We’d like them not to be stored alongside your flows — we’re looking into that, but I don’t think we want to decide when/what data should be pruned
n
Okay - as long as it’s under your radar : )