It looks like the terraform is using an older 4.x ...
# ask-community
j
It looks like the terraform is using an older 4.x AWS provider. Any plans to upgrade to 5.x? https://github.com/PrefectHQ/prefect-recipes/blob/main/devops/infrastructure-as-code/aws/tf-prefect2-ecs-agent/main.tf
d
Hey Jason, we can definitely look into this 👍 Would you mind opening an issue on the repo?
j
Are you guys actively maintaining these repos?
d
As far as I am aware we are, we are
But looking closer at this specific example, you may be interested in our terraform provider: https://github.com/PrefectHQ/terraform-provider-prefect
j
Is that what we should use to spin up resources for prefect?
cc @Zach Munro
d
We have a few different ways depending on your preference (we also have a helm chart for example)
But for terraform yes the provider is our recommended way
j
Ok, do you have an example of how we'd use the provider to create a worker pool?
e
hey @jason - each of these repos serve different use cases, based on what you might need: • you can use the prefect-recipes repository (which you originally linked to) as example implementations for running our older agents in certain non-k8s container environments, like ECS + using Terraform as the configuration • if you’re looking to spin up a workers in k8s, you could use our prefect-helm charts, which offer charts for workers, agents, and a prefect-server. workers/agents only run your flows, but require • if you’re looking to create Prefect Cloud objects, like workspaces and work pools, use the terraform provider - note that this project is still in active development and we’re still adding Prefect object support ◦ for ex., work pools can be configured with the
prefect_work_pools
terraform resource
j
We want to use fargate (no k8s) running a worker pool in our aws account
e
got it. i would definitely take a look at our
worker
Fargate recipe, which should give you a good example of the necessary TF / AWS resources (eg. iam, execution policy, ecs cluster/service) you may need to tweak it a bit for your specific requirements, but our recipes are created from working examples that we’ve set up with our users/customers https://github.com/PrefectHQ/prefect-recipes/tree/main/devops/infrastructure-as-code/aws/tf-prefect2-ecs-worker
j
Ok, so just use the terraform template. We were on that path already will give it a shot. cc @Zach Munro
👍 1
Made some progress but now running into this when the ECS task tries to run a deployment: https://github.com/PrefectHQ/prefect/issues/11637 @Edward Park
e
hmm. do you have the full output?
j
Failed to submit flow run 'a8411c17-0f0e-40c4-b0e8-062dc7b02b1c' to infrastructure.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 904, in _submit_run_and_capture_errors
result = await self.run(
File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 639, in run
) = await run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 136, in run_sync_in_worker_thread
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 755, in _create_task_and_wait_for_start
self._wait_for_task_start(
File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 1033, in _wait_for_task_start
raise type(code, (RuntimeError,), {})(reason)
prefect_aws.workers.ecs_worker.TaskFailedToStart: CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref docker.io/prefecthq/prefect:2.18.3-python3.10: failed to do request: Head "https://registry-1.docker.io/v2/prefecthq/prefect/manifests/2.18.3-python3.10": dial tcp 54.227.20.253443 i/o timeout
025944 PM
prefect.flow_runs.worker
INFO
Completed submission of flow run 'a8411c17-0f0e-40c4-b0e8-062dc7b02b1c'
we're getting a timeout actually
It looks like the task is being spun up with a different network configuration than the other worker 🤔
e
hmm. this is trying to talk to the public dockerhub registry, so im suspecting a networking issue
yeah. do you know if your task is being spun up in the correct subnet
j
Doesn't look like it
We're going to try putting them in the public subnet to see if that helps. Curiously the other worker was in the public subnet already
👍 1
Going to pick this up tomorrow. Changing the subnet didn't help
Do we need to manually override the network configuraiton in the worker pool config?
e
are you using the
aws_ecs_service
TF resource? if so, you may need to set this to
true
https://github.com/PrefectHQ/prefect-recipes/blob/main/devops/infrastructure-as-code/aws/tf-prefect2-ecs-worker/ecs.tf#L69
Copy code
assign_public_ip = true
j
Will have a look tomorrow. The listener worker / poller spins up fine and has connectivity, the tasks it creates do not.
e
ok, keep us posted
j
We're using
prefect_ecs_worker
It appears to have the public IP assignment... Let me make sure our service is actually using the latest task definition
This is our
aws_ecs_service
config which appears to be correct:
resource "aws_ecs_service" "prefect_worker_service" {
name = "prefect-worker-${var.name}"
cluster = aws_ecs_cluster.prefect_worker_cluster.id
desired_count = var.worker_desired_count
launch_type = "FARGATE"
// Public IP required for pulling secrets and images
// https://aws.amazon.com/premiumsupport/knowledge-center/ecs-unable-to-pull-secrets/
network_configuration {
security_groups = [aws_security_group.prefect_worker.id]
assign_public_ip = true
subnets = var.worker_subnets
}
task_definition = aws_ecs_task_definition.prefect_worker_task_definition.arn
}
It appears the worker that gets deployed uses a different task definition (prefect_default_<guid>) than the worker which is deployed. I can't find any reference to this in our terraform, what creates the new task definition?
e
could you share your ECS work pool’s configuration?
the work pool config defines the flow/task run’s ECS definition (separate from the worker’s task definition)
you can think of the work pool config == the job’s task definition
j
We got it going but I'll say the indirection with the task definition and pool config is a little confusing. Took a minute to understand what was going on there.