https://prefect.io logo
Title
j

Joshua Grant

04/20/2023, 5:10 PM
@Marvin are there any reasons why prefect hosted on AWS would have an agent submit the flow request twice, causing a crash?
m

Marvin

04/20/2023, 5:12 PM
I apologize for the confusion earlier. It seems that there are no existing GitHub issues related to your problem. If you're still experiencing the issue, I recommend opening a new issue on the Prefect GitHub repository to seek assistance from the Prefect community and developers. Make sure to provide detailed information about your setup, agent configuration, and the issue you're facing to help them better understand and address your problem.
n

Nate

04/20/2023, 5:13 PM
@Marvin can you reconsider the question, focus on the relationship between the agent and the ECS infrastructure block
:upvote: 1
:gratitude-thank-you: 1
dang, looks like were hitting some network timeouts from openai right now
@Joshua Grant can you share your crash logs?
j

Joshua Grant

04/20/2023, 5:19 PM
no worries. I think it's related to RDS
Essentially the agent completes submission of the flow, the flow does something like this:
Apr 19 21:44:54.333
prefect-v2-agent
01:44:54.333 | INFO | prefect.agent - Submitting flow run '9d15021f-e169-41c7-b826-3af39c6b9ef0'

Apr 19 21:44:54.315
prefect-v2-agent
01:44:54.314 | INFO | prefect.agent - Submitting flow run '9d15021f-e169-41c7-b826-3af39c6b9ef0'
it submits the flow runs twice at different times. This has only happened once. Both the flow complains:
01:46:00.764 | INFO    | prefect.engine - Engine execution of flow run '9d15021f-e169-41c7-b826-3af39c6b9ef0' aborted by orchestrator: This run cannot transition to the RUNNING state from the RUNNING state.
Then finally dies with both running flows complaining:
01:47:25.236 | INFO    | prefect.engine - Engine execution of flow run '9d15021f-e169-41c7-b826-3af39c6b9ef0' aborted by orchestrator: This run has already terminated.
n

Nate

04/20/2023, 5:25 PM
hmm, who's responsible for submitting these flow runs? is it the scheduler / an automation / api call?
j

Joshua Grant

04/20/2023, 5:25 PM
API call.
z

Zanie

04/20/2023, 5:25 PM
Those timestamps for submission are really close to each other. Are you only running one agent there?
j

Joshua Grant

04/20/2023, 5:27 PM
Every flow has its own agent and work-queue.
z

Zanie

04/20/2023, 5:28 PM
Interesting. Does this happen often?
j

Joshua Grant

04/20/2023, 5:29 PM
No, this has happened in our beta environment. Never happened in dev or alpha, but is a concern preventing promotion to prod. This is the only occurrence.
z

Zanie

04/20/2023, 5:29 PM
Hm very weird
Here’s a draft poking at a possible bug? It’d be super weird though https://github.com/PrefectHQ/prefect/pull/9280
I’m a bit confused by those logs being out of order (the submitting ones)
j

Joshua Grant

04/20/2023, 5:32 PM
sorry, copied and pasted from DataDog, it's default is most recent on top
z

Zanie

04/20/2023, 5:40 PM
Ahh okay 👍
j

Joshua Grant

04/20/2023, 8:02 PM
ha, dev-ops team had replicas on the agents, believe that was causing the issue
z

Zanie

04/20/2023, 9:39 PM
Sounds about right 😄 we still try to guard against race conditions there but within 2ms is pretty tough