Marcus Hughes

    Marcus Hughes

    1 year ago
    I've been using Prefect to run a couple automated flows of some simple tasks on a local server, and I just discovered that Prefect appears to be saving serialized/pickled results into
    ~/.prefect/results
    without me specifically telling it to. That's fine in the short term, but after letting things run I discovered I had over 800 gigabytes of results. Is there an automated way to delete these after some given time elapses built into Prefect? I could just make another flow that cleans up results that are older than a day or something to keep us from ballooning out of drive space. Just curious what the best practice is here. Also, is it safe for me to just delete entries from this directory or will it corrupt a database somewhere?
    Chris White

    Chris White

    1 year ago
    Hey Marcus! For some background, whenever you run an orchestrated flow with Prefect, your tasks' outputs are saved / "checkpointed" to a configurable location (the default is local filesystem as you learned). This allows you to, for example, rerun your workflow from failure in a future job (because Prefect can reconstruct the individual task inputs from the stored data). Additionally, this data is useful if you use any of Prefect's caching features. That being said, you're more than welcome to delete any or all of that data so long as you don't foresee needing to rerun any of your already-complete runs from a midway point. Other users will hardcore the filename for each individual task (I can show you how to do this if you're interested) so that a single file is overwritten instead of writing more and more new files with each run.
    Marcus Hughes

    Marcus Hughes

    1 year ago
    Can you show me how to do that single file approach? Also, is it possible to just turn checkpointing off? We're not really anticipating needing to rerun our flows from a midway point.
    Chris White

    Chris White

    1 year ago
    Yup for sure! One file per task (but not per task run): ā€¢ initialize your Flow like so:
    from prefect.engine.results import LocalResult
    
    with Flow(..., result=LocalResult(location="{task_name}.prefect")):
    ā€¢ to turn checkpointing off entirely you can set
    checkpoint=False
    on each individual task:
    @task(checkpoint=False)
    def my_task(...):
    Marcus Hughes

    Marcus Hughes

    1 year ago
    Thank you!
    Chris White

    Chris White

    1 year ago
    @Marvin archive "~/.prefect/results is filling up with data - what are best practices to manage this?"
    @Marvin archive "What are best practices to manage my Prefect results directory?"