Daniel Ross
03/05/2022, 8:50 PMrequests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1911c36d90>: Failed to establish a new connection: [Errno 111] Connection refused'))
If I look at the task itself, I can see that the environment variable for PREFECT__CLOUD__API is set to http://127.0.0.1:4200. So this seems like the problem.
The host it's trying to connect to is clearly wrong (since the server itself is running on an EC2 instance). So I've adjusted my ~/.prefect/config.toml to look like this:
host_ip = "my.ip.goes.here"
host_port = "4200"
host = "http://${server.host_ip}"
port = "4200"
endpoint = "${server.host}:${server.port}"
[server.ui]
apollo_url = "<http://my.ip.goes.here:4200/graphql>"
[cloud]
api = "${${backend}.endpoint}"
endpoint = "<https://api.prefect.io>"
graphql = "${cloud.api}/graphql"
No luck.
So I added the PREFECT__CLOUD__API definition to my environment variables in the container definition. Still no luck. However, when I look at the task definition, I can see the correct (or at least intended) PREFECT__CLOUD__API environment variable there. But the variable in the task is still set to http://127.0.0.1:4200, and the problem persists!
I am pretty stuck on this, and hoping that someone here has a line of sight to the solution. (This all worked without much configuration previously ... which now seems weird.)
All help appreciated!Kevin Kho
prefect server start
?expose
flag. Check the note here on the change in Prefect 0.15.5Daniel Ross
03/05/2022, 9:02 PMprefect server start -d --postgres-url "postgresql:goes/here" --expose
Kevin Kho
config.toml
, you are saying the Flow can’t communicate with the right API right? You should just need
[server]
endpoint = "YOUR_MACHINES_PUBLIC_IP:4200/graphql"
http://*YOUR_MACHINES_PUBLIC_IP*:8080
?Daniel Ross
03/05/2022, 9:12 PMKevin Kho
Daniel Ross
03/05/2022, 9:14 PMbackend = "server"
[server]
endpoint = "my.ip.goes.here:4200/graphql"
[server.ui]
apollo_url = "<http://my.ip.goes.here:4200/graphql>"
[cloud]
api = "${${backend}.endpoint}"
endpoint = "<https://api.prefect.io>"
Still no luck. Restarted the server, but did not re-register the flow.Kevin Kho
[server]
[server.ui]
apollo_url = "<http://YOUR_MACHINES_PUBLIC_IP:4200/graphql>"
This blog has the info. I think you don’t need the cloud config since you are on Prefect Server anyway?Daniel Ross
03/05/2022, 9:43 PMKevin Kho
Daniel Ross
03/05/2022, 9:57 PM"networkMode": "awsvpc",
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "flow",
"environment": [
{
"name": "PREFECT__CLOUD__API",
"value": "<http://my.ip.goes.here:4200>"
},
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my-log-group",
"awslogs-region": "my-region",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "flow_name"
}
}
}
]
},
run_task_kwargs = {
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [
subnet_a
],
"securityGroups": [
sg_a
],
"assignPublicIp": "ENABLED"
}
}
}
The environment variable is a new addition introduced to try to resolve the problem.
I'll have to cobble together a quick test run and add a local agent to test the local run.Kevin Kho
PREFECT__BACKEND: "server"
also though? But yeah I think it looks like that is not being set. I think you need to define it in the Flow container.PREFECT__BACKEND: "server"
PREFECT__SERVER__ENDPOINT: <YOUR-IP>:4200
?Daniel Ross
03/05/2022, 10:12 PMKevin Kho
Daniel Ross
03/05/2022, 10:35 PMPREFECT__USER_CONFIG_PATH='/opt/prefect/config.toml'
This isn't something I set.Anna Geller
Does the config file take precedence over environment variables?No, env variables take precedence over
config.toml
.
I think your env variables on the ECS task definition should be (/graphql
was missing + the server backend env variable):
"containerDefinitions": [
{
"name": "flow",
"environment": [
{
"name": "PREFECT__CLOUD__API",
"value": "<http://some_ip:4200/graphql>"
},
{
"name": "PREFECT__BACKEND",
"value": "server"
},
],
I can also see that there is no "image" in your containerDefinitions - if you don't set it explicitly, it will by default take the latest version which currently is 1.0.0. The problem with this is that your flow runs should use a Prefect version which is <= Prefect version of your Server. Otherwise, you may hit some API endpoints which don't exist in your server or got changed.
Can you try something like this (full task definition example):
{
"family": "prefectFlow",
"requiresCompatibilities": [
"FARGATE"
],
"networkMode": "awsvpc",
"cpu": "512",
"memory": "1024",
"taskRoleArn": "arn:aws:iam::123456789:role/prefectTaskRole",
"executionRoleArn": "arn:aws:iam::123456789:role/prefectECSAgentTaskExecutionRole",
"containerDefinitions": [
{
"name": "flow",
"image": "prefecthq/prefect:0.15.3-python3.8",
"essential": true,
"environment": [
{
"name": "AWS_RETRY_MODE",
"value": "adaptive"
},
{
"name": "AWS_MAX_ATTEMPTS",
"value": "10"
},
{
"name": "PREFECT__CLOUD__AGENT__AUTH_TOKEN",
"value": ""
},
{
"name": "PREFECT__CLOUD__API",
"value": "<http://some_ip:4200/graphql>"
},
{
"name": "PREFECT__BACKEND",
"value": "server"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/prefectFlow",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "true"
}
}
}
]
}
Daniel Ross
03/06/2022, 12:18 AMDocker(registry_url="<http://awsacctid.dkr.ecr.region.amazonaws.com|awsacctid.dkr.ecr.region.amazonaws.com>",
base_image=base_image,
files=image_files, # the files that should be copied to the image
image_name="prefect-flow-storage", # using this as image name to allow only one ECR repo to be made
image_tag=f"{slugify(flow_name)}-{idempotency_key}", # using this as the image tag to ensure efficient storage of flows, only one stored image per flow version
env_vars={
# append top level directory to PYTHONPATH
"PYTHONPATH": "$PYTHONPATH:/",
"PREFECT__CLOUD__API": "<http://172.31.73.239:4200>",
"PREFECT_SERVER__ENDPOINT":"<http://172.31.73.239:4200>"
})
The CLOUD__API and SERVER__ENDPOINT variables are new from this troubleshooting.
As for the run config, this is what it looks like:
ECSRun(
task_definition = {
"networkMode": "awsvpc",
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "flow",
"environment": [
{
"name": "PREFECT__BACKEND",
"value": "server"
},
{
"name": "PREFECT__CLOUD__API",
"value": "<http://my-ip-here:4200/graphql>"
},
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my log group",
"awslogs-region": "my region",
"awslogs-create-group": "true",
"awslogs-stream-prefix": flow_name
}
}
}
]
},
run_task_kwargs = {
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [
subnet_a
],
"securityGroups": [
security_group_b
],
"assignPublicIp": "ENABLED"
}
}
}
)
Again, the environment variables have been added throughout this troubleshooting.flow.executor = DaskExecutor(
cluster_class=fargate_cluster,
adapt_kwargs={"minimum": 1, "maximum": 10})
Anna Geller
Docker(registry_url="<http://awsacctid.dkr.ecr.region.amazonaws.com|awsacctid.dkr.ecr.region.amazonaws.com>",
base_image="prefecthq/prefect:0.15.3-python3.8",
files=image_files, # the files that should be copied to the image
image_name="prefect-flow-storage", # using this as image name to allow only one ECR repo to be made
image_tag=f"{slugify(flow_name)}-{idempotency_key}", # using this as the image tag to ensure efficient storage of flows, only one stored image per flow version
env_vars={
# append top level directory to PYTHONPATH
"PYTHONPATH": "$PYTHONPATH:/",
"PREFECT__CLOUD__API": "<http://172.31.73.239:4200/graphql>",
"PREFECT_SERVER__ENDPOINT":"<http://172.31.73.239:4200/graphql>"
})
ECSRun:
ECSRun(image=f"{AWS_ACCOUNT_ID}.<http://dkr.ecr.us-east-1.amazonaws.com/{your_image_name}:{your_image_tag}|dkr.ecr.us-east-1.amazonaws.com/{your_image_name}:{your_image_tag}>", # from your Docker storage definition
task_definition = {
"networkMode": "awsvpc",
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "flow",
"environment": [
{
"name": "PREFECT__BACKEND",
"value": "server"
},
{
"name": "PREFECT__CLOUD__API",
"value": "<http://my-ip-here:4200/graphql>"
},
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my log group",
"awslogs-region": "my region",
"awslogs-create-group": "true",
"awslogs-stream-prefix": flow_name
}
}
}
]
},
run_task_kwargs = {
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [
subnet_a
],
"securityGroups": [
security_group_b
],
"assignPublicIp": "ENABLED"
}
}
}
)
If you need some examples to help debug it, check out some flows with the name “ecs” in it here. I also had a blog post showing docker agent setup as ECS service - if nothing else works, perhaps you can try deploying a new agent to the same subnet as your Server instance and deploy it as ECS ServiceDaniel Ross
03/06/2022, 1:19 AMKevin Kho
Anna Geller
Daniel Ross
03/07/2022, 4:18 PM