hi prefect-community! is there any way to force a(...
# ask-community
h
hi prefect-community! is there any way to force a(n over)write of a checkpointed/cached task result? say i have an S3Result, can i call write on it to overwrite the previously checkpointed result (something like
s3result.write(overwrite=True)
)?
k
Hey @Horatiu Bota, are you using targets? Shouldn’t S3 overwrite by default with the same name ?
h
or rather, my task is returning an S3Result
k
Could you give me a code example so I can understand better?
h
sure, one sec
Copy code
def long_running_task(df, params):
    # do some work with df and params


with Flow("test_flow", result=S3Result(bucket="bucket_name")) as flow:

    task = task(
        target='./cache/result.csv',
        checkpoint=True,
        max_retries=1,
        retry_delay=pd.Timedelta(seconds=3),
    )

    s3_result = task(long_running_task)(df=df, params=params)

    # is this possible?
    s3_result.write(overwrite=True)
@Kevin Kho something like that - the idea being that i want to overwrite something that was previously checkpointed
k
I understand. The issue here is that you set the
target
to a file.
targets
are a form of caching in Prefect where if the file already exists, the task won’t run. What you want to do is instead use
task(…,result=S3Result(..,location=…))
the location will achieve the same thing but it not a caching mechanism so the task will still re-run
If you use
target
, the result will just be loaded from there instead of running the task
h
ah, i see, thank you! 🙏
k
And then of course, just change your task to return the df instead of the result
h
how do i do that?
k
Copy code
def long_running_task(df, params):
    # do some work with df and params
    return modified_df.  # instead of the S3Result
Also, check out the result docs because we have a pandas serializer you can use if that df is pandas
h
yep, already using the pandas serializer
k
Ah gotcha
h
hm, my task already returns a df - however, when i inspect the type of the returned object (
s3_result
above) in the
with
block, it's an S3Result
k
Ah ok I think that is fine because Prefect will resolve it for you if you do:
Copy code
with Flow(...) as flow:
    df = first_task(df)
    df = second_task(df)
Prefect handles resolving those results and passing them to the next task even if the type might be
S3Result
h
yep, that's exactly what i'm doing 😅
thank you so much Kevin! 🙏
k
Ok I misunderstood the code snippet. I thought you were creating S3Result inside the task and returning it. You’re all good. My advice should still stand to remove target. Just let me know if you still have issues 🙂