Hi I use Prefect Server I have a question regarding results Prefect Community #ask-community

Hi. I use Prefect Server. I have a question regard...

Sven Teresniak

07/28/2020, 9:09 AM

Hi. I use Prefect Server. I have a question regarding results. Checkpointing is active and I use a

LocalResult

as a flow's

result=mylocalresultinstance

to persist task results. The data is written to a NFS share which is available to DaskWorker and the agent (exact same mointpoint). With default settings (no

result

keyword in the flow definition) all worked well. Now I use some templating to better organize (and later cleanup) results. Question: 1. The UI states that I cannot restart a task because Warning: If this flow run does not have a result handler, restarting is unlikely to succeed
. Is the text now aware of the deprecated

result_handler

keyword and maybe checks the wrong setting? Does the UI need the NFS share (the result location) as well? Or any Prefect related service except agent and dask worker? 2. Once my

LocalResult

is working. Is it possible to access results from the UI? I know I can load results using the `Result`'s subclasses. But it would be easier for testing and debugging 3. Is there an elegant way to get rid of old results? Deleting old flow versions (and their runs) does not remove results. 4. How do I find which keyword parameters I can use in the format string of a

Result

location

string? Everything from

prefect.context

and all keywords a

Task.run()

got? 5. Is Prefect using (persisted) results in any way? Let's assume 1. is a bug and fixed. Does a retry for a failed task run reading its input from (maybe persisted) result objects? Something else? 6. The documentation does not say a word about uniqueness. If Prefect is using results (see 5.), then each persisted result must be unique, right? That is, overwriting a result by accident could lead to complete fuckup?

josh

07/28/2020, 12:02 PM

Allow me to attempt to answer each question in a bulleted manor 😄 1. Can I see a snippet of how you are setting the Result type on your flow/tasks? Not sure if I fully follow what’s happening here because the backend is only aware of the existence of a result type in the metadata. 2. No, the results are not serialized and sent to the database to be viewed in the UI. 3. This is kind of up to your implementation here since prefect does not have access to the actual result data. Maybe some sort of auto expiration or process which cleans up results would be the way to go. 4. https://docs.prefect.io/api/latest/utilities/context.html#context and all kwargs passed to the task run are available in the formatting. 5 and 6. For this you should look into targets https://docs.prefect.io/core/idioms/targets.html#using-result-targets-for-efficient-caching where you can effectively cache results on time/run names/etc. across runs or even different flows. This is also how you could enforce uniqueness if desired. Otherwise each run will simply store the result with a UUID unique to that run.

Sven Teresniak

07/28/2020, 12:18 PM

Perfect! Thanks a lot! targets are nice. 🙂

Sven Teresniak

07/28/2020, 12:24 PM

for 1. -- the possible bug in UI: By default all tasks using a

LocalResult()

and writing files like

prefect-result-2020-07-23t14-11-27-600908-00-00

. Now, when I want to restart a failed flow, I cannot do this.

Sven Teresniak

07/28/2020, 12:24 PM

josh

07/28/2020, 12:25 PM

Oh interesting and you’re using the Server UI?

Sven Teresniak

07/28/2020, 12:25 PM

But I see the pickled results in the filesystem. (one file for each task)

Sven Teresniak

07/28/2020, 12:25 PM

Yes, server mode with daskexecutor, v0.12.5

josh

07/28/2020, 12:26 PM

Great, there are some changes coming soon so tagging @nicholas so he is aware of this

👀 1

Sven Teresniak

07/28/2020, 12:26 PM

nice!

Sven Teresniak

07/28/2020, 12:29 PM

Thanks for the

target

-RTFM! This will come in handy! It seems that

Result

persistence (Local or S3 in our case) without using

target

is not worth much. Because basically nothing uses the content of pickled results now, right?

Marwan Sarieddine

07/28/2020, 1:05 PM

@Sven Teresniak - jumping in to answer your last question - please see this response from Chris White that details how Result is currently used … https://github.com/PrefectHQ/prefect/issues/2577#issuecomment-637903664

Marwan Sarieddine

07/28/2020, 1:06 PM

basically

Copy code

`Result` will only be used by Prefect when running against a backend and a future retry is needed, necessitating recreating a task's inputs.

Sven Teresniak

07/28/2020, 1:51 PM

@Marwan Sarieddine sorry, English is not my mother tongue. I want to completely understand this… "Future retry" means "a failed downstream task with

max_retries>0

" reads the

Result

of its successful upstream task? Or does "future retry" mean "re-run the failed flow-run" (this is not working at the moment) Or both?

Marwan Sarieddine

07/28/2020, 1:59 PM

@Sven Teresniak - same here (english is not my first language either 🙂 ) good question … I think whenever a task is being run in a Retry state, the task inputs should be formed from the pickled Result output …

Copy code

"Future retry" means "a failed downstream task with max_retries>0 " reads the Result  of its successful upstream task?

Yes in this case I think the task is being run in a Retry state …

Copy code

Or does "future retry" mean "re-run the failed flow-run" (this is not working at the moment)

I don’t think re-running a flow counts given tasks are not being run in a retry state but it would be great if @josh or someone from the prefect team could confirm

Sven Teresniak

07/28/2020, 2:11 PM

I still don't understand why

target

and

result

is mutually exclusive. I understand that

Result

is a persisted return value of a task. Okay. This is useful wenn re-run downstream tasks. Useful thing. From https://docs.prefect.io/core/PINs/PIN-16-Results-and-Targets.html#pin-16-results-and-targets I learned, that

target

can (or was supposed to) cover non-returned results (from side effects like writes to a file or S3). But with

result

and

target

mutually exclusive it is not possible to cache an expensive calculation without side effects, right? I have to persist the outcome of the calculation as part of the task and then point

target

to that exact location?

Sven Teresniak

07/28/2020, 2:17 PM

Okay, https://docs.prefect.io/api/latest/core/task.html#task-2 is interesting. Regarding

target

the doc mentions "…at write time all formatting kwargs will be passed…" Does that mean that target will also write (persist) returned results automatically if not present? The docs only mention reading. Not writing.

Sven Teresniak

07/28/2020, 2:25 PM

After reading the code of

Task

the situation is much clearer now. When

target

is provided then the

result

-location is set to target. Now it makes sense. When result data is available under

target

, the cached data will be used. If result data is not available, the

Result

implementation will write the data returned from the task to that location for future use. This makes sense.

👍 1

7 Views

Open in Slack

Previous Next