I have a brand new ECS Push work pool, and I need ...
# prefect-cloud
b
I have a brand new ECS Push work pool, and I need help with it. About 50% of the time, they fail to run. I've noticed they give me the error below whenever the infrastructure doesn't provision properly. This error happens within a few seconds of starting the job, vs. when you have a successful one, it waits patiently for the FARGATE serverless image to get to a running state. It's almost like the push event isn't being handled correctly and fails the job because there is no machine running under the ECS Task. Any help or troubleshooting would be appreciated.
Copy code
Failed due to a FileNotFoundError when trying to retrieve the flow from deployment. The file 'status/status_domain_verify.py' could not be found.
Some things about my setup: • Prefect 3.0x • I used these docs to get the infra-provisioned • I have two flows deployed to the same container image/tag (that seems to work fine - they spool different ECS Tasks when they run concurrently - when they don't Fail) • I've included the deployment code below. • I added retries at the flow level, but they never worked since the code was never found and executed.
Copy code
if __name__ == "__main__":
    verify_domain_status.deploy(
        name="verify-domain-status-ecs-deployment",
        work_pool_name="fxdmz-ecs-pool",
        image=DockerImage(
            name="<http://635441544388.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest|635441544388.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest>",
            platform="linux/amd64",
        ),
    )
Awesome - I think this was it - My old Agent Pool also had a
default
work queue, and those machines were picking it up! Going to run these overnight and see if that fixes it. https://prefect-community.slack.com/archives/CM28LL405/p1724923197732489?thread_ts=1724873769.239079&amp;cid=CM28LL405
@Luis Cebrián @Nate ☝️