Hi, I'm experiencing inconsistent periods of time ...
# ask-community
s
Hi, I'm experiencing inconsistent periods of time where task runs are failing due to what seems the task not being able to set it's own status. To be clear, my Prefect code is not making this API call, rather what seems the Prefect cloud backend. This error has happening across multiple task runs, all running task runs are failing at once, regardless what progress they are at. Example error below. My Prefect instance has been running our deployments for many months now with minimal issues, so this sudden error is surprising considering I haven't changed any config. Is there something I can do to fix this, or is this a prefect cloud error that is out of my control? Any help is appreciated, Thank you!
Copy code
Flow run: execution-manager/Invoker: FLOW_RUN_NAME
State: `Failed`
Timestamp: 2024-10-02 00:15:22.325332+00:00
Flow run URL: <https://app.prefect.cloud/account/ACCOUNT_ID/workspace/WORKSPACE_ID/flow-runs/flow-run/FLOW_RUN_ID>
State message: Flow run encountered an exception. PrefectHTTPStatusError: Client error '400 Bad Request' for url '<https://api.prefect.cloud/api/accounts/ACCOUNT_ID/workspaces/WORKSPACE_ID/task_runs/TASK_RUN_ID/set_state>'
Response: {'detail': 'There was an error parsing the body'}
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400>
2
n
hi @Sam Lawler! hmm are you able to share the version of the SDK you're using?
s
Hey Nate, 2.15.0
n
does there happen to be any more stack trace you could share? or is what's above the meat of it?
m
i’m experiencing almost the same thing, just started this evening on a flow that’s been running for ages with no problem.
n
hey @Mike Larsson thanks - can you also share the version and any trace you have on hand?
m
Copy code
Crash detected! Execution was interrupted by an unexpected exception: PrefectHTTPStatusError: Client error '400 Bad Request' for url '<https://api.prefect.cloud/api/accounts/a57203ca-4b0d-4bf2-9dd5-5ce43f44808e/workspaces/4c7df271-b1bd-4079-aba2-843b06d433a6/task_runs/>'


Response: {'detail': 'There was an error parsing the body'}
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400>
shows up in the flow run logs
we’re on prefect 2.20.3
n
thank you both! will raise this to the team and get back here
thank you 2
s
Thanks Nate! Appreciate the quick response
Hey @Nate, attached is entire python stack trace if needed
n
thank you!
m
this stacktrace coming out of the ecs task logs seems relevant
🙏 1
n
hey @Sam Lawler and @Mike Larsson ! are you still experiencing these errors on flow runs? we are monitoring a fix for this that we’ve released
m
let me check
i think it’s looking good! previously it would fail very quickly, now it’s chugging along.
n
good to hear - thank you for checking!
s
Same here, I haven't had any errors in the last 2 hours
n
great - thank you!
s
Thank you Nate & Team!
s
Hi @Nate we've been having the same errors since yesterday evening too. We're on version 2.20.2. Our workaround is limiting tasks concurrency, but the pipelines are still failing intermittently.
m
i haven’t seen more issues with flow runs, but we have an agent that periodically fails now. it started happening at about the same time as the other problem, and it failed again over night. i’m not sure if the underlying cause is the same but the timing is suspicious. stacktrace attached.
n
hi @Mike Larsson - is this something you're still seeing?
m
it last crashed about 5 hours ago
s
I've had some similar issues to my original post, but this time it only happened for a period between 1023am 1039am 2024/10/02 (Chicago Time). Like before, this is not an API call that i am making within my own code. Here's an example of the error i got this time: (the timestamp here is AEST)
Copy code
Flow run: DEPLOYMENT_NAME/FLOW_RUN_NAME
State: `Crashed`
Timestamp: 2024-10-02 15:39:39.245447+00:00
Flow run URL: <https://app.prefect.cloud/account/ACCOUNT_ID/workspace/WORKSPACE_ID/flow-runs/flow-run/FLOW_RUN_ID>
State message: Execution was interrupted by an unexpected exception: PrefectHTTPStatusError: Server error '503 Service Unavailable' for url '<https://api.prefect.cloud/api/accounts/ACCOUNT_ID/workspaces/WORKSPACE_ID/task_runs/TASK_RUN>'
Response: {'details': 'API request timed out'}
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503>
m
we briefly had a few of those too, the
API request timed out
error, but they haven’t occurred again since.
👍 2
hi @Nate, 2 follow up questions if you’re able to answer. can you share any details of the problem and/or fix that caused the issues above? and secondly, were the agent crashes likely part of the same issue? we haven’t had any since but i’m curious if they were related or just weird timing.
n
hi @Mike Larsson • there was a middleware issue related to status codes that affected the folks that commented in this thread • the agent crashes were likely related to this issue, but I can't say that I know exactly how feel free to DM if you need additional clarification!
thank you 1