Scott Zelenka
03/13/2020, 2:29 PMcache
or checkpoint
, where you could specify how long to persist the output of a given step in the pipeline to disk. So that, when you re-run the entire pipeline locally when debugging, it would simply read in the cached data, rather than re-compute each step that previously completed successfully. It was also nice in production, as it allowed commonly used tasks to share their output with other pipelines without needing to be re-computed.
It seems Prefect has a concept of output caching, but only stores this in-memory for local runs when debugging.. which is useless for this use case of iterating on logic changes and re-running the entire pipeline again.
https://docs.prefect.io/core/concepts/persistence.html#output-caching
There's mention in this Slack channel to 'use Prefect Cloud', but I cannot find any tutorials or examples of how to accomplish this. So I'm looking for guidance.
How would you use cache
in Prefect Cloud to speed up the debugging iteration process of a local Flow?Laura Lorenz (she/her)
03/13/2020, 2:41 PMScott Zelenka
03/13/2020, 2:48 PMcache
to then use an IDE to load the process into a debugger where we can troubleshoot why the particular data-point is failing 3/4 of the way through a mapped Task
.
When running the Flow locally, I can do this pretty easily in PyCharm. But I would like some guidance on how to debug this using cache
from Cloud.
Or would a custom result-handler
allow me to preserve the Result
on my local machine to debug?Laura Lorenz (she/her)
03/13/2020, 3:12 PMcache
, it needs to go through the CloudFlowRunner. You may be able to use the tips in https://docs.prefect.io/core/advanced_tutorials/local-debugging.html#use-a-flowrunner-for-stateless-execution but use a CloudFlowRunner instead, and attach your debugger to whatever you use to execute that.
Full disclosure, I haven’t actually done that before, but as long as your logic changes do not invalidate the cloud cache (which by my reading would happen if you reregister the flow or replace a task in the flow with flow.replace) it can still use the cloud cache.