Kivanc Yuksel
02/19/2022, 3:18 PMtarget
, however, from time to time I want to re-run these tasks without manually deleting target files. Is there a way to "force" re-run for such tasks?Anna Geller
Kivanc Yuksel
02/19/2022, 4:32 PMcache_for
and cache_validator
for this purpose to accomplish what I need?Anna Geller
Kivanc Yuksel
02/19/2022, 10:42 PMfrom pathlib import Path
target_location = Path("/path/to/dataset.csv")
file_path1 = Path("/path/to/some/file.txt")
file_path2 = Path("/path/to/some/file.csv")
cache_directory_generator = CacheDirectoryGenerator(
file_dependencies=[file_path1, file_path2],
parameter_dependencies={"some_param": 3}
)
target_location = cache_directory_generator.get_cache_path(target_location)
# target_location is now something like: "/path/to/04f6b63d7b27281742d7bf14e4e8323c/dataset.csv"
serializer = PandasSerializer(file_type="csv")
result = LocalResult(dir=str(target_location.parent), serializer=serializer)
long_running_task = SomeLongRunningTask(
checkpoint=True,
result=result,
target=target_location.name,
skip_on_upstream_skip=True,
)
with Flow("process-raw-data") as flow:
output = long_running_task()
flow.run()
Anna Geller
cache_for=datetime.timedelta(hours=1),
cache_validator=prefect.engine.cache_validators.all_parameters)
Kivanc Yuksel
02/20/2022, 1:21 PMCacheDirectoryGenerator
is to generate a unique path based on provided parameters and file dependencies. Yes, I know about templating, and the thing I do is actually very similar to that, so, I don't see any additional benefit of using it. My class already handles unique path generation and the resulting target
is past to the task as is.
Also, the problem I have is not that the unique location generator based on parameters and file dependencies doesn't work, it works. I just need a way to somehow "force" re-run, even though the output is cached, without manually deleting cached files, if I need to.Anna Geller
Kivanc Yuksel
02/20/2022, 2:19 PMAnna Geller
cache_for=datetime.timedelta(hours=1),
cache_validator=prefect.engine.cache_validators.all_parameters)
This will ensure that the task won't be rerun when the task input parameters (and file paths which now will also be as task inputs) haven't changed.
Also, check out this blog post from Kevin - it discusses a very similar use case https://www.prefect.io/blog/introducing-prefect-ml-orchestrate-a-distributed-hyperparameter-grid-search/emre
02/20/2022, 3:09 PMtarget
, and be able to flush your cache automatically based on user input.Kivanc Yuksel
02/20/2022, 6:00 PM