Is it possible to disable automatic input caching?
The problem with the following flow setup is that the today_date variable seems to be chached from previous days, which is not what I intend. Is it possible to avoid this input caching or a better way to structure this. The reason the caputuring of the date happens at the beginning of the code, is that the entire flow can run for longer times and enter the next day.
Copy code
with Flow('Example') as flow:
today_date = datetime.date.today().strftime("%Y-%m-%d")
data = extract_data(security_list, today_date)
load_data(data)
...
more_data = extract_more_data(security_list)
load_more_data(more_data, today_date)
k
Kevin Kho
07/07/2021, 3:39 PM
Hi @Mike Wochner, we override checkpointing on the cloud/server side so you need to disable it on the task level like
@task(checkpointing=False)
. If you are concerned about memory, some users provide the file path for the output files so it overwrites each time the flow is run, and you’ll be able to restart the last flow in case it fails
m
Mike Wochner
07/07/2021, 3:42 PM
Thanks for the answer. I have checkpointing disabled for individual tasks, though the saving of todays date happens outside of a task. Would it be best practice to make a task from it?
Mike Wochner
07/07/2021, 3:43 PM
And then disable checkpointing for that specific task?
k
Kevin Kho
07/07/2021, 3:46 PM
Oh my bad. Sorry, I didn’t read the full question. The date there will be serialized when the flow is registered, unless you use script-based storage. The fix here is to use
prefect.context.get("today")
, or you can have a task that returns the current date. The task execution will be deferred, but the content of the Flow is not, which is why we run into this.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.