Hello. We just upgraded our server/agents from 2.6...
# prefect-community
c
Hello. We just upgraded our server/agents from 2.6.4 to -> 2.6.6. We are seeing resetting connections waiting for run_deployment. Thoughts? Thread for details
1
👀 1
logs from flow run.txt
On the server side we don't see anything in the logs, just 200s from the api. We are seeing this issues pretty reliably. Running ecs fargate, for agents & flow runs. Server running on EC2. We're going to rollback to see if it helps.
b
Hi Carlo, thank you for raising this. I'll reach out to the team to see what could be happening here. Would you mind sharing a few example IDs of the the affected flow runs? Also, would you mind sharing a minimal reproducible example, and provide a description of how you created the deployment?
c
rough_flow_def.py
We are self hosted, so I'm not sure the ids will be helpful. We saw this for deployments that were scheduled and that we were manually kicked off from the UI. @Bianca Hoch do you think these are related, I saw your comment in the other thread? https://github.com/PrefectHQ/prefect/issues/7424 The agent is running in an ECS task and the server is running on an EC2 instance, so two different machines. With that said, both processes are containerized.
b
Thank you for providing the MRE! The issue I mentioned in the other thread could potentially be related. I'll raise to the team to see if my hunch is incorrect.
c
I should add when we bumped down to 2.6.4 the issue seems to have disappeared. Thanks for your help.
Hi @Bianca Hoch any update? Thanks for helping us track this down, this is currently blocking us from upgrading to the latest version.
b
Hi Carlo, apologies for the delay, we're still investigating on our end. Just to clarify, the only adjustment you made on your end is bumping down to 2.6.4?
c
correct
For our server, we use the shared images, eg prefecthq/prefect:2.6.4-python3.9
We also keep our agent versions in-line with the server, but we just pip install the dependencies into our own custom image
m
Hey @Carlo we're tracking related issues for this here https://github.com/PrefectHQ/prefect/issues/7512 if you have some time to run through reporting the behavior you were seeing
🙌 1
c
updated
m
Thanks 🙏