https://prefect.io logo
#prefect-community
Title
# prefect-community
w

Wellington Braga

03/16/2023, 11:23 AM
Yesterday I left about 1000 pipelines running every 1h with a concurrency limit of 250 and when I went back to see the result, I realized that the process consumed almost 100Gb of ram. I was very confused, because none of the pipelines store values locally, they just query an external database and send the data to another API. But even without storing any value in cache it consumed all the cache storage power of my machine. I imagine that it must be the Prefect's own sqlite database that must be consuming so much resource. Is there a way to limit this or make Prefect stop using cache memory?
1
c

Christopher Boyd

03/16/2023, 12:36 PM
You can use cgroups, but at that rate, it might just be easier to use a docker agent
Also, is it specifically “prefect” that’s using the cache? Is it sqlite? Is it the server process, or the flow runs or? Just the image alone doesn’t really indicate the what, just that it happened. I’d be interested to know what your environment looks like, where the server is running, where the agent is running, what kind of execution it is, how many flow_runs are running, and profiling the usage to see WHICH process is responsible for the cache increase. If it’s truly sqlite, then I believe
PRAGMA page_size
and
PRAGMA cache_size
can affect that behavior
w

Wellington Braga

03/16/2023, 3:29 PM
@Christopher Boyd, I just guessed that it could be the prefect database (sqlite), but in fact I don't know what is consuming these resources. I reserved a specific machine to run the prefect container, it only runs this container, nothing more. running Prefect in docker container:
Copy code
OS: Debian 11
server:
Copy code
OS: Linux Centos 8 
RAM: 128GB
CPUS: 64
Threads/core: 2
flows:
Copy code
Schedule: 1/1h
Scheduled flows: 1300
Tasks Concurrency: 1
Flow Concurrency (pool): 250
@Prefect @Zanie can u help me?
c

Christopher Boyd

03/16/2023, 6:32 PM
Without knowing what’s actually using the memory, it’s hard to evaluate . You can try the following for starters to sort processes by memory consumption . ps -o pid,user,%mem,command ax | sort -b -k3 -r It’s also still not clear if you are running both server and flow runs simultaneously on this same system ? Lastly , is there an actual issue with the memory being consumed ? Is this causing a problem ? Linux natively uses what it has available to it, and will actively swap out as it needs to.
Additionally, is this the host container that is full on memory, or the docker instance that’s full on memory? If it’s the docker instance that’s using all the available host memory, then you can set docker limits
w

Wellington Braga

03/16/2023, 6:37 PM
I know that it is possible to limit the consumption of the container, but it would be preferable to understand what is consuming so much resource.
c

Christopher Boyd

03/16/2023, 6:39 PM
I would start with
top
and
ps
Notably, the prefect.db that you listed there is in bytes, which amounts to ~77mb
w

Wellington Braga

03/16/2023, 8:58 PM
1 process per flow