Hi Everyone, we seem to be experiencing an unexpec...
# ask-community
d
Hi Everyone, we seem to be experiencing an unexpected issue with one of our ECS agents. When kicking off a flow, we get an error message:
Unable to locate credentials
. This happens right after the flow is scheduled and submitted (before any of the tasks can even begin to kick off). We don't think it is related to boto because our EC2 instance is able to read from aws buckets (confirmed by ssh'ing on the instance) via an iam policy attached to the EC2 instance. Unfortunately, running with debug mode didn't produce any more verbose results or a stack trace. We did have to reboot this instance recently after it became unresponsive due to an errant task we had running that consumed too much memory and stalled out. Has anyone experienced this before?
k
Hey @Danielle Dalton , weird it stopped working if this was working before. What flow storage are you using? S3? ECR? Github?
d
We use S3 as our flow storage.
k
You had this working without supplying credentials before just by using the EC2 role?
d
Yes, we created an iam instance profile that specified access to our S3 resources and then we assigned the iam instance profile to the EC2 instance.
k
I don’t have much advice here. It seems right. Unfortunately, ECS is a massive pain when things don’t start cuz there are no logs anywhere. The only thing I can suggest you try is explicitly passing AWS credentials to see if that works. Will ask another team member if there are better ideas.
d
Not a problem - thanks for the suggestion Kevin. When we figure this out, we'll share our answer here for others in the future.
k
Do you have a
task_role_arn
on the ECS agent?
d
Yes we do. No, we don't have a task role arn on the agent because we only use the agent for launching pre-existing ECS tasks.
For additional context, our ECS Agent provides the necessary network configurations. We don't provide it all the other things it would need to create a new ECS task. We create our task in terraform separate from the registration and build of Prefect tasks.
s
Hey @Kevin Kho, we "fixed" this by restarting the systemd services that keep our prefect agents up and running. I'm wondering if this was related to the fact that we had to reboot our EC2 instance, and perhaps by some race condition the systemd service brought the ECSAgent back up before the instance was aware of the fact that it could authenticate to AWS via an instance profile
I think there's some circumstantial evidence pointing to that being the case, especially since the ECSAgent sets up its connection/session on initialization, and doesn't create a separate boto session when deploying a flow with an ECS run config
This stuff is a little out of my depth, but it makes sense to me that something along those lines is going on after having done some debugging and reading through the source code for the
ECSAgent
and the other prefect aws utilities
k
Gotcha so we normally recommend using an ECSService instead of EC2 for spinning up the Agent. I’ll keep this in mind though in case someone runs into the same issue. That race condition sounds weird, but I totally believe you.
s
interesting okay, I can see that a lot of this documentation was published a little while after we got our ECSAgents up and running in production; I'll take a look at this!