Jacques
06/23/2020, 3:51 PMmax_retries
parameter. I'm catching the error from boto and then logging the error immediately before doing a raise signals.FAIL()
to trigger the retry mechanism. When the boto call fails (it does this once or twice a day - unpredictably) the error is caught, logs show the task is set to Retrying
, and downstream tasks are set to Pending
. All looks good until the flow is scheduled to run again, then I get a python stack overflow as some object is being pickled (I think - seeing looped calls to bits like File "/var/lang/lib/python3.7/pickle.py", line 662 in save_reduce
in the stack trace) directly after the Beginning Flow run
message. I'm using DaskExecutor
if that matters.# Subclasses of ClientError's are dynamically generated and
# cannot be pickled unless they are attributes of a
# module. So at the very least return a ClientError back.
Jim Crist-Harif
06/23/2020, 3:54 PMJacques
06/23/2020, 3:54 PMJim Crist-Harif
06/23/2020, 3:55 PMJacques
06/23/2020, 4:03 PMJim Crist-Harif
06/23/2020, 4:06 PMclass MyBotoError(Exception):
pass
def mytask(...):
try:
some_boto_thing()
except SomeBotoError as exc:
raise MyBotoError(str(exc))
Jacques
06/23/2020, 6:13 PMJim Crist-Harif
06/23/2020, 6:15 PMMyBotoError
will then be serializable.Jacques
06/24/2020, 1:09 PMstr()
into a RuntimeError
), but it still fails in the same way. Is it possible prefect is collecting all the exceptions and not just the most recent? This is fairly problematic as it causes a python stack overflow, not just a failed flow run on retry. Is there anything else that I could try?