https://prefect.io logo
j

John Horn

07/05/2023, 9:55 PM
Hi all, Working in GKE k8 environment. I have this issue where my pod will go down and the main deployment will restart. The issue is that the deployment will attempt to retrieve the past task results but it is not able to causing the common issue:
Copy code
21:16:34.430 | ERROR   | Flow run 'tricky-alligator' - Finished in state Failed('Flow run encountered an exception. MissingResult: The result was not persisted and is no longer available.\n')
My guess is that new pod or new start causes all persisted data to go away. Is there something I should be doing to address this? Thanks k8 gang
n

Nate

07/06/2023, 1:00 AM
hey @John Horn - where are you persisting your results? if you're not using GCS for this
My guess is that new pod or new start causes all persisted data to go away.
that might be something you wanna check out - that way they can live according to lifecycle rules or something
j

John Horn

07/06/2023, 1:01 AM
I started persisting now to GCS to see if that will help
n

Nate

07/06/2023, 1:01 AM
nice! that's what i would do too
j

John Horn

07/06/2023, 1:01 AM
One question I have is when persisting a sub-flow result, I noticed you can't specify the serializer
so I'm assuming the unique inputs make the fingerprint for the subflow persisted results
this gets tricky say I have: subflow: task1() task2()
I guess I should focus my persistence logic on task1 and task2 including how long to persist those tasks
rather then embed GCS persist logic in the subflow
since the results should only be valid for lets say 30 min
I don't think the subflow gives me that level of control
but the tasks do
but the error I was getting was it couldn't find the persisted result on the subflow
that was what was throwing me off
n

Nate

07/06/2023, 1:07 AM
I noticed you can't specify the serializer
you should be able to, like this
Copy code
In [5]: from prefect import flow

In [6]: @flow(persist_result=True)
   ...: def child():
   ...:     return "something important"
   ...:

In [7]: @flow
   ...: def parent():
   ...:     child.with_options(result_serializer=...)()
   ...:
are you using
persist_result=True
on the subflow? or i guess you dont need
with_options
here unless you wanna use different serializers in different cases, you could just do it in the decorator of
child
right away as well
j

John Horn

07/06/2023, 1:10 AM
If I try something like:
Copy code
@flow(
    persist_result=True,
    cache_expiration=timedelta(minutes=15),
    result_storage=GCS(
        bucket_path='yo-bucket',
        project='foobar-project',           service_account_info=os.environ['gcs-creds']
    )
)
def foobar_subflow(
then I get back:
Copy code
TypeError: flow() got an unexpected keyword argument 'cache_expiration'
and in the wild when running this deployment the pod occasionally dies with:
Copy code
prefect.exceptions.MissingResult: The result was not persisted and is no longer availabl
and the stack trace leads up to the subflow:
Copy code
blah_output = foobar_subflow(
That said haven't tried with the persist on the subflow yet to GCS
I'd almost rather it not persist at all and just retry the subflow
since it is time sensitive and not expensive to run
but that sub-flows parent flow should do persistence
n

Nate

07/06/2023, 1:15 AM
ahh, i see.
That said haven't tried with the persist on the subflow yet to GCS
i would try that otherwise there's 2 workarounds I could think of (but I anticipate the above working) ā€¢ GCS lifecycle rules ā€¢ deploy the subflow, wrap
run_deployment
in a task with
cache_expiration
and call that from your parent
j

John Horn

07/06/2023, 1:17 AM
if I set the subflow to persist_result=False then if the pod goes down it should just re-run that specific subflow right?
n

Nate

07/06/2023, 1:17 AM
well if you're calling the subflow as a python function it doesnt get its own infra (pod) it runs on the same pod as the parent
if you run a "subflow" via
run_deployment
, then that flow run gets its own infra
and in that case, if the entrypoint flow has retries, then yes it would re-run that specific flow
j

John Horn

07/06/2023, 1:19 AM
I guess in this example
Copy code
deployment_flow = parent_flow():

def parent_flow:

    @flow(persist_result=False)
    def  sub_flow_1:
        task_1

        task_2

    @flow(persist_result=False)
    def  sub_flow_2:
        task_1

        task_2
if the pod goes down mid-run I would hope that sub_flow_1 starts from scratch but sub_flow_2 can try to recover the results
assuming I rigged sub_flow_2 to persist and use GCS
n

Nate

07/06/2023, 1:23 AM
yes. I believe that is would what would happen if you had
persist_result=True
on
sub_flow_2
and the parent had to retry for some reason which would not be possible if you didnt have remote result storage, since if you had
persist_result=True
but "local" result storage, it would die along with the pod
āœ… 1
gratitude thank you 1
j

John Horn

07/06/2023, 1:24 AM
gah now I'm doubting this is true because I already has persist_result = False on sub_flow_1 and was getting that persist error
oh wells thanks I'll think on this some more
n

Nate

07/06/2023, 1:25 AM
sure - feel free to bring back any roadblocks
šŸ‘ 1