Thread
#prefect-server
    Newskooler

    Newskooler

    1 year ago
    👋 What is the best advise for cleaning up
    prefect/flows
    so that we can keep it at a manageable size (it tends to just grow naturally; and it does so quite quickly)?
    🤔
    Michael Adkins

    Michael Adkins

    1 year ago
    Hi! I’m checking in with the rest of the team about this 🙂
    Newskooler

    Newskooler

    1 year ago
    Thank you. As i mention in another thready “I don’t know what’s safe to delete. This is not 
    --volume-path
     but instead it’s the task 
    flow.storage = Local(directory='/this/location)
    ” Also it has grown to 100GB for less than 30 days.
    Dylan

    Dylan

    1 year ago
    Hi @Newskooler is that the same location where you’re storing Flow Results?
    Newskooler

    Newskooler

    1 year ago
    I am not sure I get the question @Dylan . So in my mind there are two locations : volume path and the flow.storage. I am saying I have issues with the flow.storage growing a lot. I guess this is the flow results, right?
    Michael Adkins

    Michael Adkins

    1 year ago
    flow.storage
    is where the pickled flow is stored (when using
    Local
    ) — the code of your flow
    Newskooler

    Newskooler

    1 year ago
    So why does this grow so big?
    Michael Adkins

    Michael Adkins

    1 year ago
    When using
    Local
    storage, the flow is simply pickled with
    cloudpickle
    and dumped as bytes — do you have a ton of files there? What’s the smallest/largest file?
    Newskooler

    Newskooler

    1 year ago
    I have a flow which runs every minute and that’s the one causing the issue
    I do. I will check shortly and tell you. I did delete some of them to get the server running again haha
    Michael Adkins

    Michael Adkins

    1 year ago
    How often are you registering the flow?
    Newskooler

    Newskooler

    1 year ago
    I register it about 10 times ; but don’t need to register it anymore
    For the last 30 days that is
    Michael Adkins

    Michael Adkins

    1 year ago
    Hmm. This seems a bit peculiar. Each time it is registered, the storage is “built” which means the flow is written to the directory as “flow-name.prefect”
    It should even be overwritten on subsequent registrations as far as I can tell
    Newskooler

    Newskooler

    1 year ago
    so I have 100k files inside this dir (for the last 10 days). They look like so:
    -rw-r--r-- 1 root root  1436873 Dec 10 17:24 prefect-result-2020-12-10t17-24-10-287987-00-00
    -rw-r--r-- 1 root root  1179386 Dec 10 17:24 prefect-result-2020-12-10t17-24-10-629558-00-00
    -rw-r--r-- 1 root root    13969 Dec 10 17:24 prefect-result-2020-12-10t17-24-15-435020-00-00
    -rw-r--r-- 1 root root    16465 Dec 10 17:24 prefect-result-2020-12-10t17-24-15-821351-00-00
    -rw-r--r-- 1 root root       94 Dec 10 17:24 prefect-result-2020-12-10t17-24-16-142481-00-00
    This is the largest:
    67836	./prefect-result-2020-12-08t18-57-57-583026-00-00
    and this is the smallest:
    4	./prefect-result-2020-11-30t00-00-15-595208-00-00
    For reference they are orders of magnitudes larger than the data they have processed.
    Michael Adkins

    Michael Adkins

    1 year ago
    So looking into this a bit more and to clarify: if a task has checkpointing enabled (which is the default), it has a result which is written somewhere. The default result type for task is the result type for the flow which defaults to the result type associated with the flow storage you are using. In the
    Local
    storage class, the associated result type is defined as
    result = LocalResult(self.directory, validate_dir=validate)
    — this means that this directory will contain both your serialized flow and all of your flow run results.
    Newskooler

    Newskooler

    1 year ago
    Okay, that’s by design, right and what does this mean in terms of what makes sense for me to do going forward?
    Michael Adkins

    Michael Adkins

    1 year ago
    Anything with
    prefect-result-*
    can be safely deleted if you do not need that data anymore. Furthermore, you can disable checkpointing for tasks if you don’t need the ability to resume.
    Newskooler

    Newskooler

    1 year ago
    The resume is a super cool feature. I can set up a cron job to delete (only
    prefect-result-*
    ) older than X days, does this make sense?
    Michael Adkins

    Michael Adkins

    1 year ago
    Yeah that makes sense.
    You could even make a flow 😄
    Newskooler

    Newskooler

    1 year ago
    True that. 😄 Thanks for the help!
    On your end - do you see that as an issue which needs addressing or is that an expected behaviour?
    Michael Adkins

    Michael Adkins

    1 year ago
    We’d like them not to be stored alongside your flows — we’re looking into that, but I don’t think we want to decide when/what data should be pruned
    Newskooler

    Newskooler

    1 year ago
    Okay - as long as it’s under your radar : )