Thread
#prefect-community
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Hi all, looking for some help debugging here. I have a task which finishes fine when run locally, the logs indicate it finished fine when run through an agent, but the task itself is stuck in Running. Its a simple flow (it turns a tiny pandas dataframe into a json object), and I make a call to the logger directly before the return statement. I see my log statement that it all worked, but the task state doesnt change. Is there a good way of debugging what on earth might be going on? The odd thing is that the task it never finishes is a common task (I download 4 things and process them all in the same way). 2 of them finish, every time. 2 of them get stuck on running, without fail. EDIT: I seem to have found a potential bug in Prefect. These hang ups occur when I have
    @task(timeout=20)
    , but they just completed successfully with a normal
    @task
    annotation. Will update prefect now and check to see if that helps
    For context, heres the task:
    And heres the task stuck in running:
    Michael Adkins

    Michael Adkins

    1 year ago
    Hey @Samuel Hinton -- do you think you could share a minimal reproducible example? What executor are you using & what OS is your agent running on?
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Its running on ubuntu at the moment, local dask executor, and currently on prefect 0.14.5. Going through the changelog at https://docs.prefect.io/api/latest/changelog.html and I notice the TimeoutError is in 0.14.7
    Ill update now and try and make a minimal example if its still being annoying
    Michael Adkins

    Michael Adkins

    1 year ago
    Hmm I'm not sure that should affect you but being on the soonest version will be helpful! https://github.com/PrefectHQ/prefect/issues/4091
    Samuel Hinton

    Samuel Hinton

    1 year ago
    @Michael Adkins here is a reproduction, now on 0.14.11 of prefect. Some of these timed out correctly (though honestly Im not sure why any of them should have timed out), but others (see image) didnt and run forever, despite the timeout. Local Dask Executor, launched on a docker agent, ubuntu operating system, flows run perfectly if I remove the timeout. Runs instantly when I run it locally via
    flow = get_flow()
    and
    flow.run()
    (even with the timeouts)
    Michael Adkins

    Michael Adkins

    1 year ago
    Thanks for the reproduction! Just wondering, how is this running on a docker agent without a flow.storage or flow.run_config?
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Ah I left those details out - I have multiple flows and a
    manager.py
    goes through them all, collects the flows, and assigns the run config, scheduler, bucket, etc:
    @Michael Adkins - overnight I scheduled the test flow to run every half hour. I registered two versions, one with and one without the timeouts. I had a few tasks that were stuck in running in the morning (6hours on that process task to return the json dataframe), but most of them succeeded. However, even for the flows that succeeded, those that had the timeout took consistently longer than those without. You can see two screenshots below (timeout vs no timeout, identical tasks). Tasks with timeout consistently take many times longer than those without. The task that just generated a dummy dataframe and returns it takes ~ 0.25 seconds without the timeout, and 2seconds with the timeout. Do you know why this might be?
    Michael Adkins

    Michael Adkins

    1 year ago
    With a timeout, there needs to be a supervising process to enforce the timeout. Generally this will increase task runtime a bit because of that overhead. The executor you are using combined with the system you are on determines what kind of supervising process we have to use, some of which have faster startup times and higher dependability.
    I'm not sure why some of your tasks are hanging, people use timeouts often without issue. I'll have to try to replicate your exact runtime environment (which is why I needed the run_config/executor details).
    Could you try switching your executor to use "processes" instead of "threads" ?
    Samuel Hinton

    Samuel Hinton

    1 year ago
    I’ll give it a shot when Im back in office for sure, will let you know if that changes things 🙂
    Michael Adkins

    Michael Adkins

    1 year ago
    Unfortunately I could not reproduce this in a test (https://github.com/PrefectHQ/prefect/pull/4217) although it uses a shorter timeout. The code that's being used to call
    your_task.run()
    is https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/executors.py#L184 -- it may be useful to try testing your function in isolation to see what's going on
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Just reporting back that since swapping to processes Ive seen much better behaviour and execution times, so thanks a ton for the tip
    Michael Adkins

    Michael Adkins

    1 year ago
    Glad that worked! Timeouts are tricky.