Hi Everyone we seem to be experiencing an unexpected issue w Prefect Community #ask-community

Hi Everyone, we seem to be experiencing an unexpec...

Danielle Dalton

09/07/2021, 2:41 PM

Hi Everyone, we seem to be experiencing an unexpected issue with one of our ECS agents. When kicking off a flow, we get an error message:

Unable to locate credentials

. This happens right after the flow is scheduled and submitted (before any of the tasks can even begin to kick off). We don't think it is related to boto because our EC2 instance is able to read from aws buckets (confirmed by ssh'ing on the instance) via an iam policy attached to the EC2 instance. Unfortunately, running with debug mode didn't produce any more verbose results or a stack trace. We did have to reboot this instance recently after it became unresponsive due to an errant task we had running that consumed too much memory and stalled out. Has anyone experienced this before?

Kevin Kho

09/07/2021, 2:43 PM

Hey @Danielle Dalton , weird it stopped working if this was working before. What flow storage are you using? S3? ECR? Github?

Danielle Dalton

09/07/2021, 2:44 PM

We use S3 as our flow storage.

Kevin Kho

09/07/2021, 2:46 PM

You had this working without supplying credentials before just by using the EC2 role?

Danielle Dalton

09/07/2021, 2:48 PM

Yes, we created an iam instance profile that specified access to our S3 resources and then we assigned the iam instance profile to the EC2 instance.

Kevin Kho

09/07/2021, 2:53 PM

I don’t have much advice here. It seems right. Unfortunately, ECS is a massive pain when things don’t start cuz there are no logs anywhere. The only thing I can suggest you try is explicitly passing AWS credentials to see if that works. Will ask another team member if there are better ideas.

Danielle Dalton

09/07/2021, 2:56 PM

Not a problem - thanks for the suggestion Kevin. When we figure this out, we'll share our answer here for others in the future.

Kevin Kho

09/07/2021, 2:57 PM

Do you have a

task_role_arn

on the ECS agent?

Danielle Dalton

09/07/2021, 3:03 PM

~~Yes we do.~~ No, we don't have a task role arn on the agent because we only use the agent for launching pre-existing ECS tasks.

Danielle Dalton

09/07/2021, 3:07 PM

For additional context, our ECS Agent provides the necessary network configurations. We don't provide it all the other things it would need to create a new ECS task. We create our task in terraform separate from the registration and build of Prefect tasks.

Sean Talia

09/07/2021, 3:54 PM

Hey @Kevin Kho, we "fixed" this by restarting the systemd services that keep our prefect agents up and running. I'm wondering if this was related to the fact that we had to reboot our EC2 instance, and perhaps by some race condition the systemd service brought the ECSAgent back up before the instance was aware of the fact that it could authenticate to AWS via an instance profile

Sean Talia

09/07/2021, 3:56 PM

I think there's some circumstantial evidence pointing to that being the case, especially since the ECSAgent sets up its connection/session on initialization, and doesn't create a separate boto session when deploying a flow with an ECS run config

Sean Talia

09/07/2021, 3:57 PM

This stuff is a little out of my depth, but it makes sense to me that something along those lines is going on after having done some debugging and reading through the source code for the

ECSAgent

and the other prefect aws utilities

Kevin Kho

09/07/2021, 5:13 PM

Gotcha so we normally recommend using an ECSService instead of EC2 for spinning up the Agent. I’ll keep this in mind though in case someone runs into the same issue. That race condition sounds weird, but I totally believe you.

Sean Talia

09/07/2021, 6:35 PM

interesting okay, I can see that a lot of this documentation was published a little while after we got our ECSAgents up and running in production; I'll take a look at this!

6 Views

Open in Slack

Previous Next