Hi. I use Prefect Server. I have a question regard...
# prefect-community
s
Hi. I use Prefect Server. I have a question regarding results. Checkpointing is active and I use a
LocalResult
as a flow's
result=mylocalresultinstance
to persist task results. The data is written to a NFS share which is available to DaskWorker and the agent (exact same mointpoint). With default settings (no
result
keyword in the flow definition) all worked well. Now I use some templating to better organize (and later cleanup) results. Question: 1. The UI states that I cannot restart a task because
Warning: If this flow run does not have a result handler, restarting is unlikely to succeed
. Is the text now aware of the deprecated
result_handler
keyword and maybe checks the wrong setting? Does the UI need the NFS share (the result location) as well? Or any Prefect related service except agent and dask worker? 2. Once my
LocalResult
is working. Is it possible to access results from the UI? I know I can load results using the `Result`'s subclasses. But it would be easier for testing and debugging 3. Is there an elegant way to get rid of old results? Deleting old flow versions (and their runs) does not remove results. 4. How do I find which keyword parameters I can use in the format string of a
Result
's
location
string? Everything from
prefect.context
and all keywords a
Task.run()
got? 5. Is Prefect using (persisted) results in any way? Let's assume 1. is a bug and fixed. Does a retry for a failed task run reading its input from (maybe persisted) result objects? Something else? 6. The documentation does not say a word about uniqueness. If Prefect is using results (see 5.), then each persisted result must be unique, right? That is, overwriting a result by accident could lead to complete fuckup?
j
Allow me to attempt to answer each question in a bulleted manor 😄 1. Can I see a snippet of how you are setting the Result type on your flow/tasks? Not sure if I fully follow what’s happening here because the backend is only aware of the existence of a result type in the metadata. 2. No, the results are not serialized and sent to the database to be viewed in the UI. 3. This is kind of up to your implementation here since prefect does not have access to the actual result data. Maybe some sort of auto expiration or process which cleans up results would be the way to go. 4. https://docs.prefect.io/api/latest/utilities/context.html#context and all kwargs passed to the task run are available in the formatting. 5 and 6. For this you should look into targets https://docs.prefect.io/core/idioms/targets.html#using-result-targets-for-efficient-caching where you can effectively cache results on time/run names/etc. across runs or even different flows. This is also how you could enforce uniqueness if desired. Otherwise each run will simply store the result with a UUID unique to that run.
s
Perfect! Thanks a lot! targets are nice. 🙂
for 1. -- the possible bug in UI: By default all tasks using a
LocalResult()
and writing files like
prefect-result-2020-07-23t14-11-27-600908-00-00
. Now, when I want to restart a failed flow, I cannot do this.
j
Oh interesting and you’re using the Server UI?
s
But I see the pickled results in the filesystem. (one file for each task)
Yes, server mode with daskexecutor, v0.12.5
j
Great, there are some changes coming soon so tagging @nicholas so he is aware of this
👀 1
s
nice!
Thanks for the
target
-RTFM! This will come in handy! It seems that
Result
persistence (Local or S3 in our case) without using
target
is not worth much. Because basically nothing uses the content of pickled results now, right?
m
@Sven Teresniak - jumping in to answer your last question - please see this response from Chris White that details how Result is currently used … https://github.com/PrefectHQ/prefect/issues/2577#issuecomment-637903664
basically
Copy code
`Result` will only be used by Prefect when running against a backend and a future retry is needed, necessitating recreating a task's inputs.
s
@Marwan Sarieddine sorry, English is not my mother tongue. I want to completely understand this… "Future retry" means "a failed downstream task with
max_retries>0
" reads the
Result
of its successful upstream task? Or does "future retry" mean "re-run the failed flow-run" (this is not working at the moment) Or both?
m
@Sven Teresniak - same here (english is not my first language either 🙂 ) good question … I think whenever a task is being run in a Retry state, the task inputs should be formed from the pickled Result output …
Copy code
"Future retry" means "a failed downstream task with max_retries>0 " reads the Result  of its successful upstream task?
Yes in this case I think the task is being run in a Retry state …
Copy code
Or does "future retry" mean "re-run the failed flow-run" (this is not working at the moment)
I don’t think re-running a flow counts given tasks are not being run in a retry state but it would be great if @josh or someone from the prefect team could confirm
s
I still don't understand why
target
and
result
is mutually exclusive. I understand that
Result
is a persisted return value of a task. Okay. This is useful wenn re-run downstream tasks. Useful thing. From https://docs.prefect.io/core/PINs/PIN-16-Results-and-Targets.html#pin-16-results-and-targets I learned, that
target
can (or was supposed to) cover non-returned results (from side effects like writes to a file or S3). But with
result
and
target
mutually exclusive it is not possible to cache an expensive calculation without side effects, right? I have to persist the outcome of the calculation as part of the task and then point
target
to that exact location?
Okay, https://docs.prefect.io/api/latest/core/task.html#task-2 is interesting. Regarding
target
the doc mentions "…at write time all formatting kwargs will be passed…" Does that mean that target will also write (persist) returned results automatically if not present? The docs only mention reading. Not writing.
After reading the code of
Task
the situation is much clearer now. When
target
is provided then the
result
-location is set to target. Now it makes sense. When result data is available under
target
, the cached data will be used. If result data is not available, the
Result
implementation will write the data returned from the task to that location for future use. This makes sense.
👍 1