Hi team I m attempting to debug a Flow locally that maps ove Prefect Community #ask-community

Hi team, I'm attempting to debug a Flow locally th...

Scott Zelenka

03/13/2020, 2:29 PM

Hi team, I'm attempting to debug a Flow locally that maps over a large list of objects. A single entry in that list fails to process in a given Task, and it happens to be about 3/4 the way through the list. So iterating on logic to correct the data is time intensive, as it needs to re-compute the entire pipeline before reaching the failure point (again). The way I've debugged these types of problems in other tools (i.e. Luigi) allowed for a concept of

cache

checkpoint

, where you could specify how long to persist the output of a given step in the pipeline to disk. So that, when you re-run the entire pipeline locally when debugging, it would simply read in the cached data, rather than re-compute each step that previously completed successfully. It was also nice in production, as it allowed commonly used tasks to share their output with other pipelines without needing to be re-computed. It seems Prefect has a concept of output caching, but only stores this in-memory for local runs when debugging.. which is useless for this use case of iterating on logic changes and re-running the entire pipeline again. https://docs.prefect.io/core/concepts/persistence.html#output-caching There's mention in this Slack channel to 'use Prefect Cloud', but I cannot find any tutorials or examples of how to accomplish this. So I'm looking for guidance. How would you use

cache

in Prefect Cloud to speed up the debugging iteration process of a local Flow?

Laura Lorenz (she/her)

03/13/2020, 2:41 PM

Hi Scott! You can checkpoint results for debugging using result handers (https://docs.prefect.io/core/advanced_tutorials/using-result-handlers.html). As of right now there isn’t a first-class way to persist the cache outside of the python process without using Cloud. (We happen to be working on it for Core now, you might be interested in the PIN which is here: https://docs.prefect.io/core/PINs/PIN-16-Results-and-Targets.html) If you are interested in trying out Cloud, step one would be to make a free tier account for at https://cloud.prefect.io/ and then follow this deployment tutorial at https://docs.prefect.io/cloud/tutorial/configure.html#log-in-to-prefect-cloud)

Scott Zelenka

03/13/2020, 2:48 PM

Thanks @Laura Lorenz (she/her) we have a paid version of Cloud, but I'm still not clear on how one would go about deploying a Flow to Cloud with

cache

to then use an IDE to load the process into a debugger where we can troubleshoot why the particular data-point is failing 3/4 of the way through a mapped

Task

. When running the Flow locally, I can do this pretty easily in PyCharm. But I would like some guidance on how to debug this using

cache

from Cloud. Or would a custom

result-handler

allow me to preserve the

Result

on my local machine to debug?

Laura Lorenz (she/her)

03/13/2020, 3:12 PM

Gotcha I see. I believe to have the flow connected to the persisted Cloud

cache

, it needs to go through the CloudFlowRunner. You may be able to use the tips in https://docs.prefect.io/core/advanced_tutorials/local-debugging.html#use-a-flowrunner-for-stateless-execution but use a CloudFlowRunner instead, and attach your debugger to whatever you use to execute that. Full disclosure, I haven’t actually done that before, but as long as your logic changes do not invalidate the cloud cache (which by my reading would happen if you reregister the flow or replace a task in the flow with flow.replace) it can still use the cloud cache.

Laura Lorenz (she/her)

03/13/2020, 3:17 PM

On the point of result handlers, I mentioned them if the situation was that you wanted to see/interact with the results after a flow run, but on the point of using them as a source for the cache: data written by result handlers by themselves do NOT automatically get picked up between flow runs, so to feed them in as the next run’s cache I believe you would have to construct and/or save the state objects from a prior flow run to feed into a new instance of FlowRunner (https://github.com/PrefectHQ/prefect/blob/84fc91b0d576a5036a3a2f379c3f8b86d90f8e86/src/prefect/engine/flow_runner.py#L189)

Laura Lorenz (she/her)

03/13/2020, 3:19 PM

^ sorry, I put a “do” instead of a “do NOT” in a prime place there haha if you read that before I edited it

5 Views

Open in Slack

Previous Next