scott
07/25/2023, 3:32 AMDeceivious
07/25/2023, 8:41 AMscott
07/25/2023, 4:12 PMrun_deployment
instead of a subflow, but that’s not as tidy of a solutionDeceivious
07/25/2023, 4:15 PMscott
07/25/2023, 4:15 PMDeceivious
07/28/2023, 9:54 AMWill Raphaelson
07/28/2023, 3:15 PMscott
07/28/2023, 4:10 PMwondering if you get where you need to go just by adding retries to the subflow itself? then the parent flow never fails at all and we limit the retry to the true flow run retry behavior on only the subflowThat seems problematic if the parent flow can never fail, yeah? I assume by adding retries you mean
retries
param of @flow
https://docs.prefect.io/2.11.0/api-ref/prefect/flows/#prefect.flows.Flow ? But that’s not linked to hitting the Retry button in the UI, right?Will Raphaelson
07/28/2023, 4:18 PMscott
07/28/2023, 4:20 PMim still asking internally if we can get retries of a parent flow (regardless of their method of initiation) to Actually retry subflows instead of creating new ones. that seems like a correct behavior.Yep, that’s what I’m looking for 🙏
Will Raphaelson
07/28/2023, 4:23 PMscott
07/28/2023, 4:29 PMmy_subflow.fn()
which i think makes its tasks run as if they weren’t in a subflow, but at least in our code there is still separation. we’re going with this route for now.
2. Replace all subflows with run_deployment
with timeout=None
so we wait for it to finish - this route didn’t work because any failures in these deployments appear to not cascade up to the parent flow, so it’s not a useful approach. Sure, we can retry these deployments nested within the parent flow separately, but that’s not a great user experience to have to manage retries separately for each “subflow” (that’s really a separate deployment)Will Raphaelson
07/28/2023, 4:29 PMDeceivious
07/29/2023, 7:07 AMTom Klein
08/30/2023, 12:57 AMrun_deployment
for whatever reason, the parent flow died (who knows, maybe got evicted by k8s - although - there’s no indication for it in the agent logs…) - but then we lost all the progress info on the subflows.
The subflows themselves write their data to the DB, so - they don’t “return” anything and we don’t care about them other than to know if they succeeded or not.
They are also idempotent so there’s no HARM in them running more than once, but - it’s obviously a waste of time & resources.
Because of the issue described in this thread, once retried by Prefect, the parent flow retries to execute all subflows.
Oddly, the waterfall remains the same so we have long orange (Late
) strands of the old subflows but when we click into them, they seem to show a new subflow with no logs, no previous runs, etc.
We are trying to understand if we should be using an idempotency key
- and if so - what strategy should we choose so that this doesn’t interfere with other flow runs with different params? e.g. - does {parent-flow-run-id}-{subflow-index}
makes sense? (assuming all the subflows are indexed from 0 to N) ?
what would happen exactly when the parent flow is retried? it just skips them or it can “find” the old flow run and recognize that it was complete (for the sake of composing the final state - which is the union of all final states of all subflows)
or - will the re-run just think to itself it has one less sub-flow?
Also:
1. are there any plans to fix this or change this behavior? @Will Raphaelson
2. @scott i’m not sure i understood why you suggested to add timeout=None
? we currently wait for the subflows to finish just by virtue of them being (await) gathered
with asyncio. am i missing something? Also, we had no problem to see errors propagating up from subflows.. maybeit’s related to a recent change that was made in a recent version?idempotency_key
- and it does seem like older subflows (that were generated with run_deployement
) are “safe” from re-execution once their idempotency_key
is set correctly
i mean that in the sense that not only are they not re-executed,
but also it seems like their status is remembered.
HOWEVER, the flow was not actually “retried” per se - but rather it seems like it keeps getting evicted and re-run by the K8s infra. Prefect does seem to consider it to be new runs (in the run count)
so take that with a grain of salt