I m trying to get to the bottom of a weird error hoping this Prefect Community #ask-community

I'm trying to get to the bottom of a weird error, ...

Jacques

06/23/2020, 3:51 PM

I'm trying to get to the bottom of a weird error, hoping this rings a bell for someone - I'm making a boto call in one of my tasks, and if the task fails I set it to retry with a

max_retries

parameter. I'm catching the error from boto and then logging the error immediately before doing a

raise signals.FAIL()

to trigger the retry mechanism. When the boto call fails (it does this once or twice a day - unpredictably) the error is caught, logs show the task is set to

Retrying

, and downstream tasks are set to

Pending

. All looks good until the flow is scheduled to run again, then I get a python stack overflow as some object is being pickled (I think - seeing looped calls to bits like

File "/var/lang/lib/python3.7/pickle.py", line 662 in save_reduce

in the stack trace) directly after the

Beginning Flow run

message. I'm using

DaskExecutor

if that matters.

Jacques

06/23/2020, 3:54 PM

As a side note, the boto docs have this comment for the exception that is being thrown:

Copy code

# Subclasses of ClientError's are dynamically generated and
        # cannot be pickled unless they are attributes of a
        # module. So at the very least return a ClientError back.

Jim Crist-Harif

06/23/2020, 3:54 PM

Hi Jacques, hmmmm, that is odd. There's an open issue about Prefect having issues with unserializable exceptions, let me find it for you.

Jacques

06/23/2020, 3:54 PM

Does prefect do something like pickle the last exception before a retry or something like this?

Jacques

06/23/2020, 3:55 PM

Thanks!

Jim Crist-Harif

06/23/2020, 3:55 PM

https://github.com/PrefectHQ/prefect/issues/2178

Jim Crist-Harif

06/23/2020, 3:55 PM

See this in particular: https://github.com/PrefectHQ/prefect/issues/2178#issuecomment-602110820

Jacques

06/23/2020, 4:03 PM

Ok, yes that is exactly the issue I'm having

Jacques

06/23/2020, 4:05 PM

Thanks so much, is there something I can do to make prefect not save the exception when I know it's not serializable? Or should I respond in that issue rather?

Jim Crist-Harif

06/23/2020, 4:06 PM

I'm sorry I don't have an immediate fix for you. The issue is on our radar, but probably won't be handled for a bit. One option for now - you could catch the error locally in your task, and reraise a new error that is serializable.

Jim Crist-Harif

06/23/2020, 4:06 PM

Something like:

Copy code

class MyBotoError(Exception):
    pass

def mytask(...):
    try:
        some_boto_thing()
    except SomeBotoError as exc:
        raise MyBotoError(str(exc))

Jacques

06/23/2020, 6:13 PM

Sorry for the slow reply - we use sentry.io to catch exceptions, so I don't want to leave an unhandled exception - but I think I can re-raise and then catch the second exception then signal fail - will test and pop a message on that github issue with results. Thanks so much for the help, you guys are amazing with support!

Jim Crist-Harif

06/23/2020, 6:15 PM

Generally you want to use prefect to handle exceptional cases, which is why I left an unhandled exception. Prefect will catch the exception and mark the task as failed, but the error wrapped with

MyBotoError

will then be serializable.

Jim Crist-Harif

06/23/2020, 6:15 PM

Anyway, glad you figured things out!

Jacques

06/24/2020, 1:09 PM

Tried the re-raising pattern as suggested (

str()

into a

RuntimeError

), but it still fails in the same way. Is it possible prefect is collecting all the exceptions and not just the most recent? This is fairly problematic as it causes a python stack overflow, not just a failed flow run on retry. Is there anything else that I could try?

3 Views

Open in Slack

Previous Next