Hi Trying to restrict the memory usage of each flow by passi Prefect Community #ask-community

Hi! Trying to restrict the memory usage of each fl...

Martin T

11/04/2021, 1:27 PM

Hi! Trying to restrict the memory usage of each flow by passing

host_config

to our

DockerRun

config:

Copy code

flow.storage = Docker(
        registry_url=...,
        image_name=...,
        files={...},
        env_vars={...},
        python_dependencies=[...]
        )

client = docker.APIClient()
host_config = client.create_host_config(mem_limit=12345,
                                        mem_reservation=1234
                                        )
flow.run_config = DockerRun(host_config=host_config)

flow.executor = LocalDaskExecutor()

When registered to Cloud, this seems to be ok, since starting a Run shows the following default in Host Config:

Copy code

{
  "Memory": 12345,
  "MemoryReservation": 1234
}

However, this seems to have no effect on the the newly created flow containers.

docker stats

show that MEM USAGE keeps growing and a LIMIT that equals the total server memory.

docker inspect <CONTAINER> | grep \"Memory[\"R]

gives

Copy code

"Memory": 0,
            "MemoryReservation": 0,

What are we missing here?

✅ 1

Anna Geller

11/04/2021, 1:31 PM

you can leverage the host_config on the run configuration to specify that on a per flow basis, but it’s supposed to be a dictionary:

Copy code

from prefect.run_configs import DockerRun

run_config = DockerRun(host_config=dict(mem_limit=12345))

more info: • https://docs.prefect.io/api/latest/run_configs.html#dockerrun • https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config

Martin T

11/04/2021, 1:34 PM

Yes I tried exactly that first, but it also doesn't work.

Martin T

11/04/2021, 1:35 PM

It is unclear from the docs, whether mem_limit or MemoryLimit is the correct format in the host_config.

Martin T

11/04/2021, 1:35 PM

https://github.com/PrefectHQ/prefect/issues/5007

Anna Geller

11/04/2021, 1:37 PM

afaik,

mem_limit

should be used

Anna Geller

11/04/2021, 1:38 PM

can you share your DockerRun with host_config as dict? which executor do you use - do you use Dask by any chance?

Martin T

11/04/2021, 1:40 PM

Yes

flow.executor = LocalDaskExecutor()

as listed above

Anna Geller

11/04/2021, 1:47 PM

sorry, my bad, I omitted that. Is this your run_config?

Copy code

run_config = DockerRun(host_config=dict(mem_limit=12345))

I was asking about executor, because you can limit the resources by specifying fewer workers e.g.

Copy code

flow.executor = LocalDaskExecutor(scheduler="processes", num_workers=2)

Does it make sense to set limits on the executor rather than flow? Are you setting this because your flow is running out of memory?

Martin T

11/04/2021, 1:51 PM

Yes, I've just tested that

run_config

again, still no luck. I've also tried `LocalExecutor()`to ensure this is not a Dask issue.

Martin T

11/04/2021, 1:55 PM

We are starting many different flows (some are created dynamically) and want to avoid OOM on the server, while still using as much capacity as possible. Therefore

mem_limit

and

mem_reservation

is suited for our needs. (

num_workers

(for dask) and/or Cloud concurrency limits for Flows/Tasks won't help us because number of tasks is not proportional to the workloads of individual runs.)

Anna Geller

11/04/2021, 2:00 PM

I see, thx for sharing background on this. Did you try setting

mem_limit

as string e.g.

host_config=dict(mem_limit="128m")

Martin T

11/04/2021, 2:00 PM

Yes 🙂

Martin T

11/04/2021, 2:02 PM

Also, I can experiment with the values directly from the UI when submitting runs:

Anna Geller

11/04/2021, 2:03 PM

thx, I will look into that and try reproduce

Martin T

11/04/2021, 2:04 PM

Great, appreciate it.

Kevin Kho

11/04/2021, 2:26 PM

I’m working on reproducing. What is your output of

pip show docker

? I just noticed

mem_reservation

was added in 4.3.1.

mem_limit

should work though. But the version will help me be sure.

upvote 1

Martin T

11/04/2021, 3:14 PM

pip show docker

5.0.3

locally where i build deploy... (?)

Kevin Kho

11/04/2021, 4:27 PM

Ok I took a stab at this and I can’t replicate. I have a Flow here and registered and ran it with the Docker Agent. I then did inspect on the running container.

👍 1

upvote 1

Kevin Kho

11/04/2021, 4:27 PM

And saw that it was set:

Copy code

"Memory": 1073741824,
            "MemoryReservation": 12345678,

👍 1

Kevin Kho

11/04/2021, 4:28 PM

Could you try registering my Flow and running it and running the inspect?

Martin T

11/04/2021, 6:06 PM

After registering the flow and starting it from cloud UI, I log in to the server where the agent is running. I use

docker stats

to find the new autogenerated container name (all other container have specific names). Then

docker inspect <CONTAINER> | grep

gives:

Copy code

"Memory": 0,
            "MemoryReservation": 0,

Here is the server docker version you mentioned (if relevant):

Copy code

[server]$ docker exec -it prefect-docker-agent-xyz /bin/bash
root@970eb27929c5:/# pip show docker
Name: docker
Version: 4.4.4

and the flow container:

Copy code

[server]$ docker exec -it brave_bardeen /bin/bash
root@5f8a37ed4886:/# pip show docker
Name: docker
Version: 5.0.2

Kevin Kho

11/04/2021, 7:04 PM

What is the OS of your server? And what is the output of

docker version

Kevin Kho

11/04/2021, 7:09 PM

Are you on Cloud or Server? and are you running the Docker agent in a Docker container?

Martin T

11/05/2021, 6:51 AM

We use Prefect Cloud with on-prem CentOS servers running the agent. The agent is running in a docker container (managed by docker swarm).

Copy code

[server]$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

[server]$ docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:58:10 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:56:35 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Copy code

$ docker inspect prefect-docker-agent-for-cloud_prefect-agent.xxx.yyy
...
        "Args": [
            "-g",
            "--",
            "entrypoint.sh",
            "sh",
            "-c",
            "prefect agent docker start --no-docker-interface --log-level DEBUG --token xxx -n ${PREFECT_NAME} --network prefect-docker-agent-for-cloud-network --volume /var/run/docker.sock:/var/run/docker.sock:ro --volume /root/.docker/:/root/.docker/:ro"
        ],
...

Anna Geller

11/05/2021, 10:14 AM

@Martin T thanks for sharing. I see what is happening. In general, Docker agent is supposed to run in a local process (not in a docker container) and this local process is a layer between Prefect backend and a Docker Daemon. This agent polls the API for new flow runs, and if there are new flow runs scheduled to run, it then creates new flow runs and deploys those as Docker containers on the same machine as the agent. What is happening in your environment is that since the Docker agent is running within a container itself (rather than a local process), your flow runs end up deployed as container, but not as individual containers for which you could control the memory limits, but rather within the agent container. You effectively have a single agent container spinning up new containers within itself (docker in docker), which causes the issues with setting memory limits that you encountered. When it comes to Docker swarm, Prefect currently doesn’t have any agent that would manage containers in a Docker swarm. So what you could try instead is starting your Docker agent in a local process:

Copy code

prefect agent docker start --label YOUR_LABEL --log-level DEBUG --key YOUR_API_KEY -n ${PREFECT_NAME} --network prefect-docker-agent-for-cloud-network --volume /var/run/docker.sock:/var/run/docker.sock:ro --volume /root/.docker/:/root/.docker/:ro

If you want more environment isolation for this process, you can run it within a virtual environment. I also noticed that: 1. You use API token rather than API key - API tokens are deprecated, so API key would be preferable 2. The --no-docker-interface flag is also deprecated (I believe it’s likely because of exactly this problem that you’re facing with Docker-in-Docker) 3. You have no labels assigned to the agent. Usually, you add a label to the agent, and then add the same label to your flow so that you can match flow with the agent for deployment - I included the label in the above command that you can start in a local process. The CLI docs provide more information on all of that: https://docs.prefect.io/api/latest/cli/agent.html#docker-start

🙌 1

Martin T

11/05/2021, 10:33 AM

@Anna Geller @Kevin Kho thanks for your swift support. Will talk to infra on how we host the agents. Closing for now.

Martin T

11/05/2021, 10:42 AM

Of curiosity, does the kubernetes agent also have to run as a local process, or is it allowed to run containerized? (swarm -> k8s on our roadmap)

Anna Geller

11/05/2021, 11:06 AM

Great question! Since Kubernetes API is different from the Docker API, Kubernetes agent works a bit differently. It typically runs within an individual pod, from there it polls the Prefect API for new flows, and if there are any new scheduled flow runs, it then deploys those as Kubernetes jobs. Usually each Kubernetes job (corresponding to a specific flow run) runs within an individual pod. In general, since everything in Kubernetes is an object from the API perspective, there is usually no such risk of Docker-in-Docker as with the Docker agent. And the resource allocation is a bit easier too, e.g. on KubernetesRun, you can directly set your resource limits for a flow: • cpu_limit=None, cpu_request=None, • memory_limit=None, memory_request=None

🙏 1

Anna Geller

11/05/2021, 11:08 AM

But the above is only if you deploy the Kubernetes agent in-cluster, but if you want, you could run Kubernetes agent as a local process, similarly to a Docker agent: https://docs.prefect.io/orchestration/agents/kubernetes.html#agent-configuration

13 Views

Open in Slack

Previous Next