t

    Thomas Hoeck

    2 years ago
    Hi all! Is there a way to limit which repositories the Docker Agent is allowed to pull from? Because as I see it, if someone got access to your Prefect account they could schedule your Docker Agent to run any image of their liking. This would have some pretty big security implications as you probably have provided your Docker Agent with secrets and that it probably is ruining on your on-prem network. As I see it this gives the Prefect Team (in theory) access to running code on all on-prem networks and extract the secrets set on the Docker Agent through env-vars.
    s

    Sven Teresniak

    2 years ago
    For security reasons we never give container citizens any permissions to access k8s or docker. I use a Prefect-Dask cluster managed by k8s. Works great this far.
    t

    Thomas Hoeck

    2 years ago
    But which agent are you using? the k8 agent?
    Does someone from the Prefect team have anything to weigh-in on with this? @Jeremiah
    j

    josh

    2 years ago
    Hi @Thomas Hoeck, I’m not sure I fully understand your question - of course any piece of infrastructure you run must be appropriately configured for your networks, or it will have unlimited access. Your cloud provider, docker registry, or even your docker daemon itself all have ways to ensure trusted communication with your infrastructure. The Prefect Agent itself respects those channels by using whatever authentication is provided to it. As an example, the Docker standard daemon has DCT which can be used to “sign” your daemon and images with a key pair so only approved images can be pulled https://docs.docker.com/engine/security/trust/content_trust/
    t

    Thomas Hoeck

    2 years ago
    Sorry if it is unclear. 🙂 Lets say I start a Docker agent (with some environment variables set) and connect it to the Prefect cloud. If some gain access to my Prefect Cloud they could register a flow of their liking where they use a Docker image that they have created which will run their code on my machine. However if the Docker Agent was only allowed to pull from my private docker repo I could limit what code could be run on the agent.
    j

    josh

    2 years ago
    Configuration on Docker authentication is handled through your cloud provider, docker registry, or docker daemon. The Prefect Agent respects the authentication of whichever Docker API it is connected to. 🙂
    t

    Thomas Hoeck

    2 years ago
    As far as I know - you can't set Docker to only using a specific repository. Hence the blocking would have to happen when the Prefect Docker Agent is about to run the container. Lets say I create a docker repository with a malicious image. Then if I get access to you Prefect Cloud I could register a Flow and say that it should use the malicious image. My Prefect Docker Agent would then pull it and run it.
    Jeremiah

    Jeremiah

    2 years ago
    @Thomas Hoeck I think you’re disregarding the link Josh sent, which shows how Docker allows you to limit the images any agent is allowed to pull by restricting the Docker Daemon it is connected to - I am sure this is just one of a variety of mechanisms available to you. On the Prefect side, I’m sure the team would accept a PR that would allow the Docker agent to validate the image against a pre-approved list of registries or tags, as this would make the experience slightly easier for users that don’t know how or don’t have permission to modify their runtime environments.
    t

    Thomas Hoeck

    2 years ago
    @Jeremiah It might just be me - but I can't see anywhere in the link that Josh sent how I can achieve this. A person with bad intent could just sign their image. I might give it a go with the PR instead 🙂 Thank you for the answer.
    j

    josh

    2 years ago
    A person with bad intent can’t sign the image because you sign it with your own public/private key pair (similar to how you authenticate with git, ssh, etc.). The whole page is about how to sign your images and daemon to approve specific images for pull.
    Image consumers can enable DCT to ensure that images they use were signed. If a consumer enables DCT, they can only pull, run, or build with trusted images. Enabling DCT is a bit like applying a “filter” to your registry. Consumers “see” only signed image tags and the less desirable, unsigned image tags are “invisible” to them.
    t

    Thomas Hoeck

    2 years ago
    If it is with an explicit content hash it doesn't seem to matter. 🤔
    For example, with DCT enabled a 
    docker pull someimage:latest
     only succeeds if 
    someimage:latest
     is signed. However, an operation with an explicit content hash always succeeds as long as the hash exists
    @Sven Teresniak what kind of storage are you using then?
    s

    Sven Teresniak

    2 years ago
    @Thomas Hoeck LocalStorage on NFS (EFS)
    t

    Thomas Hoeck

    2 years ago
    @Sven Teresniak And then you just mount this to all the nodes in the dask-cluser?
    s

    Sven Teresniak

    2 years ago
    correct. the agent has access to the nfs share as well. the prefect home (by default
    ~/.prefect
    ) is also part of the nfs share.
    t

    Thomas Hoeck

    2 years ago
    @Sven Teresniak Thanks 🙂 Is it then a local agent running in a docker image deployed to your k8?
    s

    Sven Teresniak

    2 years ago
    Exactly. Every component is a container. At the moment everything is part of a single K8S pod because we still evaluate prefect. Later we will put the Dask worker in a ReplicaSet, Postgres in a StatefulSet and so on…
    t

    Thomas Hoeck

    2 years ago
    @Sven Teresniak How about auxiliary scripts, such as self developed task classes? Do you build that into your dask and agent image? Or do you simply store them in the nfs as well? Sorry for all the questions but I was hoping to just use the K8 agent, but I can't find a satisfying way to limit its access.
    s

    Sven Teresniak

    2 years ago
    Yes, I put all dependencies into the worker's image. Dask worker and agent share the same image. Other components does not need deps like pyspark, pandas, presto etc. With this kind of static setup we only have one problem: the dask worker running continuously. That is, putting flow's (shared) boilerplate into external scripts is tricky. I have to restart dask worker by hand in order to make changed python imports work.
    When your boilerplate is stable, this is not a real problem though
    And I don't have an elegant (flow) deployment workflow now. At the moment I just scp all stuff to the agent and register the new flow version. Next step will be automatic deployment from git...
    t

    Thomas Hoeck

    2 years ago
    Okay, cool thanks 🙂 Yeah, I have been trying out using the k8 and docker storage which provides a very smooth way of solving this, but I think I will try out your method.