Hi all! We have an issue with the prefect docker a...
# prefect-community
b
Hi all! We have an issue with the prefect docker agent. We're using Azure and ACR with a system-managed identity assigned to vm. In order to login to acr we run every 3 hrs, there's a systemd timer for that:
Copy code
az acr login
But, it seems that prefect docker agent only reads the token during start and stores it in memory. Unless we restart the prefect docker agent, it is unable to pull docker image flows after 3hrs (the acr token's expired)
a
Are you running the Docker agent and your flow registration process on the same Azure VM (asking in case you are using Docker storage)? Can you share your flow storage and run config? Based on this docs, managed identity assigned to this VM should solve the issue. If not, something didn't go well with assigning those permissions. To figure out which process is at fault, can you try pulling your ACR image from that VM without login via CLI (az acr login)? If the identity is set, you shouldn't need that extra CLI process and you should be able to do:
Copy code
docker pull yourCustomACRimage
b
yes, we're running the agent on the same VM and we've verified that manual
docker pull
works it seems that prefect's using the python DockerApi client and it doesn't refresh the in-memory credentials
e.g. it seems that prefect docker agent only reads the token during the startup process. if we restart the docker agent - it's able to pull images
a
I'm no Azure expert, but when you use managed identity (aka IAM role?), you shouldn't need to authenticate with a token every 3 hours. Do you happen to have Azure support?
b
yes, it's exactly like IAM role attached to EC2 and it has the 3hours lifetime indeed. we refresh it every 3 hours with a separate script. so you're able to do
docker pull
without any authentication on each VM but prefect agent doesn't pick up the refreshed token
you can probably reproduce it with any environment (it's not specific to azure): 1. start prefect docker agent 2. login to private registry after the agent's started 3. try to run the flow that's using private registry as storage 4. it should fail
a
If this would work like IAM role attached to EC2, it wouldn't matter when you started the agent - the permission is set for the machine. So you shouldn't have to refresh the token every 3 hours. If the permissions are set properly for the VM, the IAM role should be all you need. If you look at this, you are currently doing option 1 for individual entity, while it seems for the IAM-way you should do the service principal option. Can you try this tutorial? I'll probably open an issue and see if some Azure pro can chime in and help.
@Marvin open "Docker agent on Azure cannot pull images from ACR on a VM with a system-managed identity"
b
*after 3 hours
b
it's not a direct 1:1 behavior mapping with IAM. Azure MSI allows you to authenticate to registry without any credentials. But the session token must be renewed every 3 hours
a
Thanks! But I really believe this is more designed for a developer machine, not for production application access. Can you try this tutorial?
b
The thing is, we renew it within this time frame with a separate script the performs
az acr login
, so you're able to do
docker pull
at any given time but prefect docker agent only reads the token when it starts. after 3 hours this tokens because it's expired
yes, we're using it exactly like that, with SP
unfortunately acr tokens are always limited to 3hrs
a
Are you sure? I'm no Azure expert but it seems this is valid for 1 year Azure must have some way to run actual applications with ACR images 😅
b
so you need to run either
docker login
or
az acr login
before pulling or pushing to/from the ACR
SP password != docker token
these are different entities when you login to acr you use the "SP Password" the result of the login -> docker token
a
I see. We have some partnership with Azure, maybe they can help us more here. Keep us posted if you find any solution until then. I opened an issue and that's all I can do for now
b
the solution might be using the static admin registry credentials: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#admin-account but we'd like to avoid that tbh
🙏 1
a
@Bogdan Bliznyuk, I’ve got an update for you! Sorry it took some time but now we’ve got a fix to all the issues we discussed. The Azure marketplace Docker agent is not yet updated but for now, we’ve published a full walkthrough on how you can set up a new VM on Azure and spin up a Docker agent in a robust way, including setting up long-lived permissions for the agent to pull images from ACR and ensuring that the agent starts automatically on VM start (if you ever want to reboot the VM or shut it down for the night). Check this out and LMK if something is unclear or doesn’t work for you: https://discourse.prefect.io/t/how-to-spin-up-a-docker-agent-on-azure-vm-a-full-walkthrough/407
b
hey! thank you very much, I really appreciate your follow up! unfortunately this guide won't work (as it is exactly what we're doing) we were able to resolve the issue - restarting the agent every 3 hours. i.e. each time we do docker login with the same creds (service princepal) it creates a new docker token each time. docker token != static service principal credentials. you obtain it as a result from the authentication. f you run the above setup more than 3hrs without restarting the prefect agent - you will have the exact problem with an expired docker token.
a
Trust me that this will work, I can promise you that! I know exactly what you’re saying because I saw the same when using ACR login with
az acr login
, but this guide shows how you can generate long-lived credentials that will persist and the way to do it is to use docker login with a service principal credentials:
Copy code
docker login <http://prefectcommunity.azurecr.io|prefectcommunity.azurecr.io> -u $USER_NAME -p $PASSWORD
I can 100% guarantee that if you follow this on a new VM, you won’t have to restart your agent every 3 hours and you will not have to login again to ACR every 3 hours. In fact, I used the same approach on my local machine and I haven’t logged in for 3 days and I can still push new images to ACR. The likely issue that you may have is related to Azure CLI version that I covered in this section: https://discourse.prefect.io/t/how-to-spin-up-a-docker-agent-on-azure-vm-a-full-walkthrough/407#troubleshooting-tips-8 It only works if you use Azure CLI version 2.25.0 or later Azure unfortunately doesn’t make this process easy and they don’t document it well enough in my opinion, but I’m 100% positive that this approach generates long-lived permissions that persist. I had this agent running for 2 days and I didn’t have to login a single time after setting it up the first time. And again, I really understand your frustrations because I went through all the same pain as you did 😄
b
ah, I missed the point with
docker login
instead
az acr login
. this actually a good point and should work. we'll try it out and let you know. thank you very much!!!
🙌 1