Hi guys, I am currently following <this guide> on ...
# ask-community
k
Hi guys, I am currently following this guide on deployments using ECS and I am struggling. I am using a self-managed instance on EC2. When I first started my worker, it was fine and able to communicate with my server and all the other components. But I had to go and debug other things. Now a few days later that stuff is sorted, I can't seem to start another worker. My work queue is in "Not Ready" status, presumably because there is a problem with the worker. I get this error message:
Copy code
(prefect-2) ubuntu@ip-172-31-13-253:~/Docker/eti-data-pipeline-worflows$ prefect worker start --pool my-ecs-pool
Discovered type 'ecs' for work pool 'my-ecs-pool'.
Worker 'ECSWorker 05116438-a5b0-42d3-a699-1a23bed3ba35' started!

Failed the last 3 attempts. Please check your environment and configuration.
Examples of recent errors:

Traceback (most recent call last):
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/utilities/services.py", line 64, in critical_service_loop
    await workload()
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/workers/base.py", line 760, in get_and_submit_flow_runs
    runs_response = await self._get_scheduled_flow_runs()
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/workers/base.py", line 917, in _get_scheduled_flow_runs
    await self.client.get_scheduled_flow_runs_for_work_pool(
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/client/orchestration/_work_pools/client.py", line 586, in get_scheduled_flow_runs_for_work_pool
    response = await self.request(
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/client/orchestration/base.py", line 53, in request
    return await self._client.send(request)
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/client/base.py", line 354, in send
    response.raise_for_status()
  File "/home/ubuntu/.conda/envs/prefect-2/lib/python3.10/site-packages/prefect/client/base.py", line 162, in raise_for_status
    raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<http://127.0.0.1:4200/api/work_pools/my-ecs-pool/get_scheduled_flow_runs>'
Response: {'exception_message': 'Internal Server Error'}
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500>
I presume this has something to do with my worker not able to access the API. but my
task-definition.json
already has my prefect api user key configured. I'm at a loss as to what step I've missed. Perhaps my
PREFECT_APU_URL
is incorrect?
j
hey, do you have access the prefect server logs? The 500 error is unexpected, but you should be able to see the full error log server side of what's happening when the worker is trying to hit
<http://127.0.0.1:4200/api/work_pools/my-ecs-pool/get_scheduled_flow_runs>
k
I am getting this:
{"detail":"Method Not Allowed"}
But the way I accessed that is I changed
127.0.0.01
to the public DNS of the EC2 instance I am running from
j
oh sorry, my apologies! That URL is what your worker is hitting to get the 500 status code error (it's only a POST, so it makes sense you would get method not allowed if you visited in the browser, as that is a GET by default) If you go to the sever logs where your API is running, you should be able to see errors explaining the root cause of the error. 500 always means some unexpected server side problem
k
It turns out I needed to set
PREFECT_API_URL
to
<ec2-ipv4-public-dns>:4200/api
and now the work queue is ping-able
🙌 1
Problem is I am not getting this when running my sample flow:
Copy code
Reported flow run '8fb8e07d-25ec-4616-b13f-67ec929a1bd6' as crashed: Flow run could not be submitted to infrastructure:
AccessDeniedException('An error occurred (AccessDeniedException) when calling the RegisterTaskDefinition operation: User: arn:aws:sts::306326271806:assumed-role/eti-dataworkflow-ecs-task-executioner/1287dcb7d9a2476483178719de9b0cf5 is not authorized to perform: ecs:RegisterTaskDefinition on resource: arn:aws:ecs:ap-southeast-1:306326271806:task-definition/prefect_my-ecs-pool_24570927-b2c5-4aa2-bb78-63924aeb6c95:* because no identity-based policy allows the ecs:RegisterTaskDefinition action')
This is an AWS permissions issue, right?
j
Yes, that's correct! You'll need to add the
RegisterTaskDefinition
permission to the role
k
By the way, @Jake Kaplan, I tried using the push action to push to ECR during deployment but I can't seem to authenticate correctly. Is there a step I am missing, apart from authenticating via AWS CLI?