John Shearer

    John Shearer

    9 months ago
    Is it expected that running a local flow with
    PREFECT__FLOWS__CHECKPOINTING=false
    but with checkpoint data present in the prefect result directory would read from those results? - I would expect this, but this is the current behaviour (on my machine ...)
    Anna Geller

    Anna Geller

    9 months ago
    @John Shearer afaik, what you set on
    @task(checkpoint=False)
    is important
    John Shearer

    John Shearer

    9 months ago
    fyi - in this case my result locations are set by only the current day, so today I ran a couple of flows with checkingpoint=true, but now further jobs with checkingpoint=false are using that (old) checkingpoint data
    In all cases I have
    @task(checkpoint=True)
    for the tasks, but checkingpoint has been disabled at the flow level (either config file, or environment variable)
    Kevin Kho

    Kevin Kho

    9 months ago
    That env variable is overridden to True or Cloud or Server runs so it needs to be done at the task level
    John Shearer

    John Shearer

    9 months ago
    for tasks with
    @task(checkpoint=True)
    they don't write out results when env var
    PREFECT__FLOWS__CHECKPOINTING=false
    , but they do appear to read results in that same case (if they are already present)
    This is running without cloud or server (from local pytests)
    Kevin Kho

    Kevin Kho

    9 months ago
    They won’t for local runs without a backend. That’s right because that env variable is respected
    John Shearer

    John Shearer

    9 months ago
    I think my case is a little odd, I'll try to go through step by step. one minute ...
    • I have a number of tasks in my flow with
    @task(checkpoint=True)
    - with a result location based on the current date (no time component). results directory is initially empty 1. I run my flow from pytest with environment variable
    PREFECT__FLOWS__CHECKPOINTING=false
    a. this creates no files in the results directory - YAY 🙂 2. I run my flow from pytest with environment variable
    PREFECT__FLOWS__CHECKPOINTING=true
    a. this creates some files in the results directory - YAY 🙂 3. I run my flow (again) from pytest with environment variable
    PREFECT__FLOWS__CHECKPOINTING=true
    a. this reads from the files in the results directory - YAY 🙂 4. I run my flow from pytest with environment variable
    PREFECT__FLOWS__CHECKPOINTING=false
    a. this reads from the files in the results directory - Unexpected 😟 Does that make sense?
    Kevin Kho

    Kevin Kho

    9 months ago
    Ah ok I see what you mean. Can you show me how you defined the Result location?
    John Shearer

    John Shearer

    9 months ago
    @task(result=pandas_result, target=parquet_location)
    def some_task():
       ...
    -
    import pendulum
    
    def pickle_location(**kwargs) -> str:
        return location_by_extension(suffix="pickle", **kwargs)
    
    def parquet_location(**kwargs) -> str:
        return location_by_extension(suffix="parquet", **kwargs)
    
    def location_by_extension(flow_name, scheduled_start_time, task_slug, suffix="parquet", **kwargs):
        date: str = pendulum.instance(scheduled_start_time).format("Y/M/D")
        # time: str = slugify(pendulum.instance(scheduled_start_time).time().isoformat())
    
        # return f"{date}/{time}__{flow_run_id}/{task_slug}-prefect_result.{suffix}"
        return f"{flow_name}/{date}/{task_slug}-prefect_result.{suffix}"
    sure. Sorry, it's a little ugly
    I wouldn't expect
    result
    or
    target
    to be used with PREFECT__FLOWS__CHECKPOINTING=false , though I likely have a misunderstanding somewhere
    (FYI - no urgency on this. It's not blocking me, and I'm signing off for today anyway) - thanks 😍
    Kevin Kho

    Kevin Kho

    9 months ago
    It’s the
    target
    that is causing this behavior. Targets are file based caching mechanisms so if the file exists, it will load the file instead of executing the task. You can use the
    result.location
    instead without the target and I think this will work an intended
    @task(result=Result(..,location="..."))
    John Shearer

    John Shearer

    9 months ago
    Oh great. That'll do nicely
    Thanks so much