Felix Schran
01/13/2021, 6:02 PM@task(
_name_="Extract",
_checkpoint_=True,
_result_=LocalResult(cache_dir),
_target_="{task_name}--{today}"
)
_def_ extract(x):
return 1
@task(
_name_="Transform",
_checkpoint_=True,
_result_=LocalResult(cache_dir),
_target_="{task_name}--{today}"
)
_def_ transform(x):
return x * 2
Caching works with this example. With each run of the flow, prefect looks for a file "transform--2021-13-01" and if it exists, it uses the cached result.
I want to add the following features:
1. eWhenever, I change the source code of theWhenever I change the source code of the extract
task (say to return 20
instead of return 1
) I obviously don't want to use the cached result (i.e. 1
) as an input for the next task. Instead I want to make extract
recompute whenever I change the content of extract
. How can I do that?
2. When the result of an upstream DAG changes, I want to execute the DAG which follows from that point downwards (although the downstream tasks might also be tasks with a cache). For instance, in this example, I want transform
to take the new input of 20 and compute again with that input (although the result is already cached).josh
01/13/2021, 6:07 PMflow_id
(will cache once for each new version of the flow)
2. The DAG will recompute downwards if it is not cached, so since you are still using task_name + today as the target on your transform task it won’t run again until the task name is different or the day has changed. Perhaps look into using flow_run_id
which will allow you to cache results for a particular run and if you restart it will reuse the cached results but won’t have a bearing on future runsFelix Schran
01/13/2021, 6:43 PMflow_id
just seems to be the name of the flow and is not really solving the problem. Would it somehow be possible to hash the source code of a task and use that hash as a context value? Whenevery I changed something in the code, the hash would then change as well?josh
01/13/2021, 6:45 PMFelix Schran
01/14/2021, 10:02 AMFelix Schran
01/14/2021, 10:02 AMFelix Schran
01/14/2021, 10:46 AMcache_for
is also used as an argument. Any chance to get them also working with the target
option?Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by