Hello, we're using Prefect 2 and have a question. Once the flow run is crashed due to the EKS pod size is too small, can the flow continue to run the
on_crashed
state change hook? I tested it using my local env (process worker) and it went to crashed status and ran the
on_crashed
hook. However, when I run it in the prod env (kubernetes worker), the flow crashed due to EKS pod memory size is too small, but it didn't run the
on_crashed
hook at all. Could someone please help🙏?
local env:
Yufei Li
12/06/2024, 3:46 PM
prod env
j
Jake Kaplan
12/06/2024, 3:49 PM
Hey, the
on_crashed
hook will execute as part of the python process your flow run is executing in. So if theres a crash in process, it can execute.
However if the entire process dies, like from an out of memory error, it doesn't have an opportunity to run.
I would look at setting up an automation https://docs.prefect.io/v3/automate/index#automate-overview, which should let you take external actions when a crashed state occurs.
y
Yufei Li
12/06/2024, 3:51 PM
is that automation available in prefect v2?
j
Jake Kaplan
12/06/2024, 4:04 PM
They are in
2.x
however it's an experimental feature. You'd have to set on your server
PREFECT_EXPERIMENTAL_EVENTS=True
Automations are fully supported in Cloud and
3.x
versions
y
Yufei Li
12/06/2024, 4:38 PM
I see👍 Are we able to update the deployment job variables in the Automation actions? Our scenario is the memory size limit is set in the job variables in the deployment. If the run crashed due to out of memory, we hope to update the deployment job variables and then trigger a rerun. Is that something we can do in the automation actions?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.