Hi all Working in GKE k8 environment I have this issue where Prefect Community #prefect-kubernetes

Hi all, Working in GKE k8 environment. I have thi...

John Horn

07/05/2023, 9:55 PM

Hi all, Working in GKE k8 environment. I have this issue where my pod will go down and the main deployment will restart. The issue is that the deployment will attempt to retrieve the past task results but it is not able to causing the common issue:

Copy code

21:16:34.430 | ERROR   | Flow run 'tricky-alligator' - Finished in state Failed('Flow run encountered an exception. MissingResult: The result was not persisted and is no longer available.\n')

My guess is that new pod or new start causes all persisted data to go away. Is there something I should be doing to address this? Thanks k8 gang

Nate

07/06/2023, 1:00 AM

hey @John Horn - where are you persisting your results? if you're not using GCS for this

My guess is that new pod or new start causes all persisted data to go away.

that might be something you wanna check out - that way they can live according to lifecycle rules or something

John Horn

07/06/2023, 1:01 AM

I started persisting now to GCS to see if that will help

Nate

07/06/2023, 1:01 AM

nice! that's what i would do too

John Horn

07/06/2023, 1:01 AM

One question I have is when persisting a sub-flow result, I noticed you can't specify the serializer

John Horn

07/06/2023, 1:02 AM

so I'm assuming the unique inputs make the fingerprint for the subflow persisted results

John Horn

07/06/2023, 1:03 AM

this gets tricky say I have: subflow: task1() task2()

John Horn

07/06/2023, 1:03 AM

I guess I should focus my persistence logic on task1 and task2 including how long to persist those tasks

John Horn

07/06/2023, 1:03 AM

rather then embed GCS persist logic in the subflow

John Horn

07/06/2023, 1:04 AM

since the results should only be valid for lets say 30 min

John Horn

07/06/2023, 1:04 AM

I don't think the subflow gives me that level of control

John Horn

07/06/2023, 1:04 AM

but the tasks do

John Horn

07/06/2023, 1:06 AM

but the error I was getting was it couldn't find the persisted result on the subflow

John Horn

07/06/2023, 1:06 AM

that was what was throwing me off

Nate

07/06/2023, 1:07 AM

I noticed you can't specify the serializer

you should be able to, like this

Copy code

In [5]: from prefect import flow

In [6]: @flow(persist_result=True)
   ...: def child():
   ...:     return "something important"
   ...:

In [7]: @flow
   ...: def parent():
   ...:     child.with_options(result_serializer=...)()
   ...:

are you using

persist_result=True

on the subflow? or i guess you dont need

with_options

here unless you wanna use different serializers in different cases, you could just do it in the decorator of

child

right away as well

John Horn

07/06/2023, 1:10 AM

If I try something like:

Copy code

@flow(
    persist_result=True,
    cache_expiration=timedelta(minutes=15),
    result_storage=GCS(
        bucket_path='yo-bucket',
        project='foobar-project',           service_account_info=os.environ['gcs-creds']
    )
)
def foobar_subflow(

then I get back:

Copy code

TypeError: flow() got an unexpected keyword argument 'cache_expiration'

John Horn

07/06/2023, 1:11 AM

and in the wild when running this deployment the pod occasionally dies with:

Copy code

prefect.exceptions.MissingResult: The result was not persisted and is no longer availabl

and the stack trace leads up to the subflow:

Copy code

blah_output = foobar_subflow(

John Horn

07/06/2023, 1:12 AM

That said haven't tried with the persist on the subflow yet to GCS

John Horn

07/06/2023, 1:12 AM

I'd almost rather it not persist at all and just retry the subflow

John Horn

07/06/2023, 1:13 AM

since it is time sensitive and not expensive to run

John Horn

07/06/2023, 1:13 AM

but that sub-flows parent flow should do persistence

Nate

07/06/2023, 1:15 AM

ahh, i see.

That said haven't tried with the persist on the subflow yet to GCS

i would try that otherwise there's 2 workarounds I could think of (but I anticipate the above working) • GCS lifecycle rules • deploy the subflow, wrap

run_deployment

in a task with

cache_expiration

and call that from your parent

John Horn

07/06/2023, 1:17 AM

if I set the subflow to persist_result=False then if the pod goes down it should just re-run that specific subflow right?

Nate

07/06/2023, 1:17 AM

well if you're calling the subflow as a python function it doesnt get its own infra (pod) it runs on the same pod as the parent

Nate

07/06/2023, 1:18 AM

if you run a "subflow" via

run_deployment

, then that flow run gets its own infra

Nate

07/06/2023, 1:19 AM

and in that case, if the entrypoint flow has retries, then yes it would re-run that specific flow

John Horn

07/06/2023, 1:19 AM

I guess in this example

Copy code

deployment_flow = parent_flow():

def parent_flow:

    @flow(persist_result=False)
    def  sub_flow_1:
        task_1

        task_2

    @flow(persist_result=False)
    def  sub_flow_2:
        task_1

        task_2

John Horn

07/06/2023, 1:20 AM

if the pod goes down mid-run I would hope that sub_flow_1 starts from scratch but sub_flow_2 can try to recover the results

John Horn

07/06/2023, 1:20 AM

assuming I rigged sub_flow_2 to persist and use GCS

Nate

07/06/2023, 1:23 AM

yes. I believe that is would what would happen if you had

persist_result=True

sub_flow_2

and the parent had to retry for some reason which would not be possible if you didnt have remote result storage, since if you had

persist_result=True

but "local" result storage, it would die along with the pod

gratitude thank you 1

✅ 1

John Horn

07/06/2023, 1:24 AM

gah now I'm doubting this is true because I already has persist_result = False on sub_flow_1 and was getting that persist error

John Horn

07/06/2023, 1:24 AM

oh wells thanks I'll think on this some more

Nate

07/06/2023, 1:25 AM

sure - feel free to bring back any roadblocks

👍 1

Open in Slack

Previous Next