Martin T

    Martin T

    10 months ago
    Hi! Trying to restrict the memory usage of each flow by passing
    host_config
    to our
    DockerRun
    config:
    flow.storage = Docker(
            registry_url=...,
            image_name=...,
            files={...},
            env_vars={...},
            python_dependencies=[...]
            )
    
    client = docker.APIClient()
    host_config = client.create_host_config(mem_limit=12345,
                                            mem_reservation=1234
                                            )
    flow.run_config = DockerRun(host_config=host_config)
    
    flow.executor = LocalDaskExecutor()
    When registered to Cloud, this seems to be ok, since starting a Run shows the following default in Host Config:
    {
      "Memory": 12345,
      "MemoryReservation": 1234
    }
    However, this seems to have no effect on the the newly created flow containers.
    docker stats
    show that MEM USAGE keeps growing and a LIMIT that equals the total server memory.
    docker inspect <CONTAINER> | grep \"Memory[\"R]
    gives
    "Memory": 0,
                "MemoryReservation": 0,
    What are we missing here?
    Anna Geller

    Anna Geller

    10 months ago
    you can leverage the host_config on the run configuration to specify that on a per flow basis, but it’s supposed to be a dictionary:
    from prefect.run_configs import DockerRun
    
    run_config = DockerRun(host_config=dict(mem_limit=12345))
    more info: • https://docs.prefect.io/api/latest/run_configs.html#dockerrun https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config
    Martin T

    Martin T

    10 months ago
    Yes I tried exactly that first, but it also doesn't work.
    It is unclear from the docs, whether mem_limit or MemoryLimit is the correct format in the host_config.
    Anna Geller

    Anna Geller

    10 months ago
    afaik,
    mem_limit
    should be used
    can you share your DockerRun with host_config as dict? which executor do you use - do you use Dask by any chance?
    Martin T

    Martin T

    10 months ago
    Yes
    flow.executor = LocalDaskExecutor()
    as listed above
    Anna Geller

    Anna Geller

    10 months ago
    sorry, my bad, I omitted that. Is this your run_config?
    run_config = DockerRun(host_config=dict(mem_limit=12345))
    I was asking about executor, because you can limit the resources by specifying fewer workers e.g.
    flow.executor = LocalDaskExecutor(scheduler="processes", num_workers=2)
    Does it make sense to set limits on the executor rather than flow? Are you setting this because your flow is running out of memory?
    Martin T

    Martin T

    10 months ago
    Yes, I've just tested that
    run_config
    again, still no luck. I've also tried LocalExecutor()to ensure this is not a Dask issue.
    We are starting many different flows (some are created dynamically) and want to avoid OOM on the server, while still using as much capacity as possible. Therefore
    mem_limit
    and
    mem_reservation
    is suited for our needs. (
    num_workers
    (for dask) and/or Cloud concurrency limits for Flows/Tasks won't help us because number of tasks is not proportional to the workloads of individual runs.)
    Anna Geller

    Anna Geller

    10 months ago
    I see, thx for sharing background on this. Did you try setting
    mem_limit
    as string e.g.
    host_config=dict(mem_limit="128m")
    ?
    Martin T

    Martin T

    10 months ago
    Yes 🙂
    Also, I can experiment with the values directly from the UI when submitting runs:
    Anna Geller

    Anna Geller

    10 months ago
    thx, I will look into that and try reproduce
    Martin T

    Martin T

    10 months ago
    Great, appreciate it.
    Kevin Kho

    Kevin Kho

    10 months ago
    I’m working on reproducing. What is your output of
    pip show docker
    ? I just noticed
    mem_reservation
    was added in 4.3.1.
    mem_limit
    should work though. But the version will help me be sure.
    Martin T

    Martin T

    10 months ago
    pip show docker
    =
    5.0.3
    locally where i build deploy... (?)
    Kevin Kho

    Kevin Kho

    10 months ago
    Ok I took a stab at this and I can’t replicate. I have a Flow here and registered and ran it with the Docker Agent. I then did inspect on the running container.
    And saw that it was set:
    "Memory": 1073741824,
                "MemoryReservation": 12345678,
    Could you try registering my Flow and running it and running the inspect?
    Martin T

    Martin T

    10 months ago
    After registering the flow and starting it from cloud UI, I log in to the server where the agent is running. I use
    docker stats
    to find the new autogenerated container name (all other container have specific names). Then
    docker inspect <CONTAINER> | grep
    gives:
    "Memory": 0,
                "MemoryReservation": 0,
    Here is the server docker version you mentioned (if relevant):
    [server]$ docker exec -it prefect-docker-agent-xyz /bin/bash
    root@970eb27929c5:/# pip show docker
    Name: docker
    Version: 4.4.4
    and the flow container:
    [server]$ docker exec -it brave_bardeen /bin/bash
    root@5f8a37ed4886:/# pip show docker
    Name: docker
    Version: 5.0.2
    Kevin Kho

    Kevin Kho

    10 months ago
    What is the OS of your server? And what is the output of
    docker version
    ?
    Are you on Cloud or Server? and are you running the Docker agent in a Docker container?
    Martin T

    Martin T

    10 months ago
    We use Prefect Cloud with on-prem CentOS servers running the agent. The agent is running in a docker container (managed by docker swarm).
    [server]$ cat /etc/redhat-release
    CentOS Linux release 7.9.2009 (Core)
    
    [server]$ docker version
    Client: Docker Engine - Community
     Version:           20.10.7
     API version:       1.41
     Go version:        go1.13.15
     Git commit:        f0df350
     Built:             Wed Jun  2 11:58:10 2021
     OS/Arch:           linux/amd64
     Context:           default
     Experimental:      true
    
    Server: Docker Engine - Community
     Engine:
      Version:          20.10.7
      API version:      1.41 (minimum version 1.12)
      Go version:       go1.13.15
      Git commit:       b0f5bc3
      Built:            Wed Jun  2 11:56:35 2021
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.4.6
      GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
     runc:
      Version:          1.0.0-rc95
      GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
     docker-init:
      Version:          0.19.0
      GitCommit:        de40ad0
    $ docker inspect prefect-docker-agent-for-cloud_prefect-agent.xxx.yyy
    ...
            "Args": [
                "-g",
                "--",
                "entrypoint.sh",
                "sh",
                "-c",
                "prefect agent docker start --no-docker-interface --log-level DEBUG --token xxx -n ${PREFECT_NAME} --network prefect-docker-agent-for-cloud-network --volume /var/run/docker.sock:/var/run/docker.sock:ro --volume /root/.docker/:/root/.docker/:ro"
            ],
    ...
    Anna Geller

    Anna Geller

    10 months ago
    @Martin T thanks for sharing. I see what is happening. In general, Docker agent is supposed to run in a local process (not in a docker container) and this local process is a layer between Prefect backend and a Docker Daemon. This agent polls the API for new flow runs, and if there are new flow runs scheduled to run, it then creates new flow runs and deploys those as Docker containers on the same machine as the agent. What is happening in your environment is that since the Docker agent is running within a container itself (rather than a local process), your flow runs end up deployed as container, but not as individual containers for which you could control the memory limits, but rather within the agent container. You effectively have a single agent container spinning up new containers within itself (docker in docker), which causes the issues with setting memory limits that you encountered. When it comes to Docker swarm, Prefect currently doesn’t have any agent that would manage containers in a Docker swarm. So what you could try instead is starting your Docker agent in a local process:
    prefect agent docker start --label YOUR_LABEL --log-level DEBUG --key YOUR_API_KEY -n ${PREFECT_NAME} --network prefect-docker-agent-for-cloud-network --volume /var/run/docker.sock:/var/run/docker.sock:ro --volume /root/.docker/:/root/.docker/:ro
    If you want more environment isolation for this process, you can run it within a virtual environment. I also noticed that:1. You use API token rather than API key - API tokens are deprecated, so API key would be preferable 2. The --no-docker-interface flag is also deprecated (I believe it’s likely because of exactly this problem that you’re facing with Docker-in-Docker) 3. You have no labels assigned to the agent. Usually, you add a label to the agent, and then add the same label to your flow so that you can match flow with the agent for deployment - I included the label in the above command that you can start in a local process. The CLI docs provide more information on all of that: https://docs.prefect.io/api/latest/cli/agent.html#docker-start
    Martin T

    Martin T

    10 months ago
    @Anna Geller @Kevin Kho thanks for your swift support. Will talk to infra on how we host the agents. Closing for now.
    Of curiosity, does the kubernetes agent also have to run as a local process, or is it allowed to run containerized? (swarm -> k8s on our roadmap)
    Anna Geller

    Anna Geller

    10 months ago
    Great question! Since Kubernetes API is different from the Docker API, Kubernetes agent works a bit differently. It typically runs within an individual pod, from there it polls the Prefect API for new flows, and if there are any new scheduled flow runs, it then deploys those as Kubernetes jobs. Usually each Kubernetes job (corresponding to a specific flow run) runs within an individual pod. In general, since everything in Kubernetes is an object from the API perspective, there is usually no such risk of Docker-in-Docker as with the Docker agent. And the resource allocation is a bit easier too, e.g. on KubernetesRun, you can directly set your resource limits for a flow: • cpu_limit=None, cpu_request=None, • memory_limit=None, memory_request=None
    But the above is only if you deploy the Kubernetes agent in-cluster, but if you want, you could run Kubernetes agent as a local process, similarly to a Docker agent: https://docs.prefect.io/orchestration/agents/kubernetes.html#agent-configuration