Hello, we have a task with a timeout limit, Timeou...
# prefect-community
h
Hello, we have a task with a timeout limit, Timeout limit has been working, however, we had an incident today that the task has been running way over its limit but didn’t timeout. We use GKE cluster for agent and flow runs in docker container. Our Prefect version:
prefecthq/prefect:0.13.15-python3.8
the flow run: https://cloud.prefect.io/semios/flow-run/19fcc78a-9442-48ef-8228-a9c3db18e341 Please see the threads for more details.
The task that we set the timeout is a
DbtShellTask
, the timeout is set for 15 mins. In the past, when the task ran longer than that, a
TimeoutError
exception were raised, which then we could handle according it.
however, today, this task has been running for 1:35 hours, and not being timeout.
what we know is that the k8e job pod of this run are still up and running, the last log was at the
dbtShellTask
run. by inspecting BigQuery logs, we could see that this task was running and then freezed after 10 mins of its execution.
the task run seems not being able to raise timeout from it.
z
Hi! What kind of executor are you using? Some of our timeouts are best-effort and not guarantees.
h
Good question. I am not 100% sure about the executor. I think we are using a localExecutor. below is what we do to push the flow to prefect-cloud
Copy code
flow.environment = LocalEnvironment(labels=labels)
low.storage = Docker(...)
flow.register(...    )
z
That executor shouldn’t fail, but I’m not sure why it didn’t cancel it. Since timeouts are implemented in-process where your task is running we don’t get any logs of it on our end.
h
yes, I just confirmed that we are using
localExecutor
and we don’t need
LocalDaskExecutor
or
DaskExecutor
because it’s pretty lightweight and won’t benefit from
parallelism
.
as you said the
timeouts are implemented in-process where your task is running
. Unfortunately our k8e job pod didn’t print out any more useful logs for this. Is there any logging you would suggest for troubleshooting this issue?
z
The relevant function is
prefect.utilities.executors.run_with_thread_timeout
— it uses the
prefect
logger with debug level logs so if you change set the
PREFECT__LOGGING__LEVEL=DEBUG
in your execution environment you should be able to see the printouts