Thread
#prefect-community
    h

    Hui Zheng

    1 year ago
    Hello, we have a task with a timeout limit, Timeout limit has been working, however, we had an incident today that the task has been running way over its limit but didn’t timeout. We use GKE cluster for agent and flow runs in docker container. Our Prefect version:
    prefecthq/prefect:0.13.15-python3.8
    the flow run: https://cloud.prefect.io/semios/flow-run/19fcc78a-9442-48ef-8228-a9c3db18e341 Please see the threads for more details.
    The task that we set the timeout is a
    DbtShellTask
    , the timeout is set for 15 mins. In the past, when the task ran longer than that, a
    TimeoutError
    exception were raised, which then we could handle according it.
    however, today, this task has been running for 1:35 hours, and not being timeout.
    what we know is that the k8e job pod of this run are still up and running, the last log was at the
    dbtShellTask
    run. by inspecting BigQuery logs, we could see that this task was running and then freezed after 10 mins of its execution.
    the task run seems not being able to raise timeout from it.
    Michael Adkins

    Michael Adkins

    1 year ago
    Hi! What kind of executor are you using? Some of our timeouts are best-effort and not guarantees.
    h

    Hui Zheng

    1 year ago
    Good question. I am not 100% sure about the executor. I think we are using a localExecutor. below is what we do to push the flow to prefect-cloud
    flow.environment = LocalEnvironment(labels=labels)
    low.storage = Docker(...)
    flow.register(...    )
    Michael Adkins

    Michael Adkins

    1 year ago
    That executor shouldn’t fail, but I’m not sure why it didn’t cancel it. Since timeouts are implemented in-process where your task is running we don’t get any logs of it on our end.
    h

    Hui Zheng

    1 year ago
    yes, I just confirmed that we are using
    localExecutor
    and we don’t need
    LocalDaskExecutor
    or
    DaskExecutor
    because it’s pretty lightweight and won’t benefit from
    parallelism
    .
    as you said the
    timeouts are implemented in-process where your task is running
    . Unfortunately our k8e job pod didn’t print out any more useful logs for this. Is there any logging you would suggest for troubleshooting this issue?
    Michael Adkins

    Michael Adkins

    1 year ago
    The relevant function is
    prefect.utilities.executors.run_with_thread_timeout
    — it uses the
    prefect
    logger with debug level logs so if you change set the
    PREFECT__LOGGING__LEVEL=DEBUG
    in your execution environment you should be able to see the printouts