Hi Community! I have pretty odd situation with AWS...
# ask-community
i
Hi Community! I have pretty odd situation with AWS and Prefect. I'm using prefect serveless work-pool. Infra was provisioned via:
prefect work-pool create --type ecs:push --provision-infra my-ecs-pool
and I'm trying to submit a simple flow with custom docker image (need to use several python packages, but it is not the case). Here is the code:
Copy code
from prefect import flow, task
from prefect_shell import ShellOperation
from prefect.docker import DockerImage

@flow(log_prints=True)            
def my_flow(name: str = "world"):                          
    print(f"Hello {name}! I'm a flow running in a ECS task!")
    install_result = test_cli()
    print(f"Install result {install_result}")

@task
def test_cli():

    return ShellOperation(
        commands=[
            "echo 'Hello!'",
        ],
    ).run()



if __name__ == "__main__":
    my_flow.deploy(
        name="my-deployment-1", 
        work_pool_name="aws-ecs-pool",
        image=DockerImage(                                                 
            name="prefect-flows:latest",
            platform="linux/amd64",
            dockerfile="Dockerfile",
        )                                                                      
    )
The issue I have: most times I get an issue during flow execution:
Copy code
Flow run could not be submitted to infrastructure: TaskFailedToStart - CannotPullContainerError: The task cannot pull <http://629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest|629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest> from the registry <http://629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest|629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest>. There is a connection issue between the task and the registry. Check your task network configuration. : failed to copy: httpReadSeeker: failed open: failed to do request: Get <http://629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest|629775924176.dkr.ecr.us-east-1.amazonaws.com/prefect-flows:latest>: dial tcp 54.231.224.2:443: i/o timeout
Initially I was supposing, not every component was installed by prefect cli for provisioning of infra, however form time to time flow is being executed. Could advice how I could reach stable prefect job execution on serveless aws infrastructure?
j
hey! It sounds like the ECS task cannot connect to ECR, which could because of a unexpected configuration on the VPC? When you used the
--provision-infra
command, did use all the defaults to spin up everything fresh or did you pass another existing VPC maybe?
i
@Jake Kaplan I use everything default. I would agree with you about issues with VPC, however I could not explain runs when everything is going well. Aprox. 1 out of 4 run is going well. Not sure... Any advice on configuring VPC somehow?
j
The 1 in 4 working is very odd to me. I am not super familiar with the provisioning code but I believe it sets up 3 subnets across 3 availability zones. https://github.com/PrefectHQ/prefect/blob/ec443b99f1a6c86dd9640b3fd88f5b8b891dd756/src/prefect/infrastructure/provisioners/ecs.py#L776-L793 Could you check if the jobs that are failing/succeeding are reliably in the same subnet/availability zone? Alternatively I would also consider trying tearing everything down, recreating or seeing if it persists. Maybe an issue with how the CIDR blocks are chosen.
the other thought is if you have any network throttling or resource constraints already setup that could be causing it.
i
I have checked the code and compared to existing VPC: there is a check for default VPC existing. I have that one, and Prefect skips creation of separate VPC. Just reusing existing one. I also do not see any subnets created by Prefect. I could try to recreate the whole infra. Is there any CLI command to destroy resources created by prefect? Or should I go over them manually? I have done several runs and job pick up some random subnet. Succesfull run happens on subnet with rule: 0.0.0.0/0 I believe, that is the reason. I do not want to amend default subnet (since have other infrastructure running). Looks like I need create separate VPC and subnet or somehow force Prefect to use only 1 subnet from default VPC. How could I proceed from here?
j
On your work pool there is a setting called
network_configuration
where you should be able to specify a single subnet to use: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-awsvpcconfiguration.html
alternatively you should be able to spin up a separate vpc using something like this:
Copy code
import asyncio
from prefect.infrastructure.provisioners.ecs import VpcResource

async def main():
    def mock_advance():
        pass
    
    base_job_template = {
        "variables": {
            "properties": {
                "vpc_id": {
                    "default": "",
                },
            },
        }
    }
    
    await VpcResource(
        vpc_name=...,
        ecs_security_group_name=...,
    ).provision(base_job_template, mock_advance)
    
    print(base_job_template)

if __name__ == '__main__':
    asyncio.run(main())
and that should spin up a new VPC and spit out the vpc id
you'd have to add it manually to your work pool though
i
On your work pool there is a setting called
network_configuration
where you should be able to specify a single subnet to use:
How should I apply that setting? Via prefect cli?
j
easiest is to go to your work pool in the UI, edit, scroll down to Network Configuration and fill out there
i
Thank you! I managed to setup those settings and pin up one subnet... However, it fails on very next run with that subnet. So I have 1 fail and 1 success with the same subnet. My assumption about subnets looks incorrect.
j
hmm. it definitely seems like it's something with the existing vpc but hard to say what. Can you try running the script I linked earlier create a new VPC? I pulled it out of the CLI command, I believe that should work. Otherwise you may need to try to create new vpc manually and attach that
i
Should I just execute that script as regular python one?
j
yes you should be able to, would need prefect installed and aws creds set as you probably have already (it's just a piece of --provision-infra)
👍 1
i
Will try to do it later! Thank you!
Not sure, but looks like it could not be triggered on its own. I get an error:
Copy code
from prefect.infrastructure.provisioners import (
ImportError: cannot import name '_provisioners' from partially initialized module 'prefect.infrastructure.provisioners' (most likely due to a circular import) (/.../rb-prefect-master/venv/lib64/python3.10/site-packages/prefect/infrastructure/provisioners/__init__.py)
@Jake Kaplan any advice how to workaround it?
j
Darn, it's really not meant to be used standalone. I would recommend probably: • Either creating a new VPC manually and trying that. • Re-running the
--provision-infra
cli command, but first making sure your current vpc is not marked as
IsDefault
so it will create a new one • Trying to figure out what in your existing VPC is leading to network failures if not the subnet issue
i
@Jake Kaplan appreciate you help here! Little bit confused with that one: since you could not "unmark" default VPC, there is no way to go this way without deleting "default" VPC which could critical to infra. I wonder, why prefect designed to use default VPC and it could not deal with that correctly (well, in my case for sure). Is there any step by step guide of creating VPC manually for Prefect?
j
So Prefect does not have any special requirements for how the VPC is setup, that's totally up to you and your networking requirements. You generally need to specify a VPC in order to run ECS tasks. The default VPC that AWS provides is compatible with ECS, but it's possible modifications were made or you have additional configuration? Using the default VPC is preferred because it is less that needs to be created new. But prefect will try do a best effort creation of one to reduce friction. Prefect itself doesn't offer any step by step guide of of creating VPC but you could follow the individual steps here or look at the aws docs for it. I'm not entirely 100% sure of the problem with the default VPC, it just seems network related 😄