Hey prefectians
I have a yet another question regarding memory usage. Not sure if it's a prefect problem or dask problem or just my ignorance.
Here is the most useless flow humanity ever created:
Copy code
@task
def donothing(x):
pass
with Flow('useless') as flow:
lst = list(range(4000))
donothing.map(lst)
flow.executor = DaskExecutor('<tcp://localhost:8786>')
Copy code
dask-worker --nthreads=50
Thing is the worker quickly eats up a lot of memory with each mapped task run, up to a gigabyte at the end of the flow, and that memory is not cleared when the flow finishes. The project i'm working on implies running up to ~100000 io-heavy tasks, so seeing this i'm a little worried that prefect might not be a right tool for the job. But maybe it's me doing something wrong?
k
Kevin Kho
04/14/2021, 2:01 PM
Hi @tash lai, this seems related to the garbage collection question you had before. I think the issue here is just a larger than expected memory usage but you know how to do “work around” the garbage collection by splitting files?
t
tash lai
04/14/2021, 2:24 PM
thing is, this "donothing" task doesn't ever return anything, i suspect it's task metadata that takes so much memory... however it seems like around 250 kilobytes per task run, which does seem a little bit too much
tash lai
04/14/2021, 2:25 PM
I'll try using LocalDaskExecutor
j
Jeremiah
04/14/2021, 2:27 PM
“prefectians” is a new one — we’re going to need to start keeping a list 🙂
t
tash lai
04/14/2021, 2:37 PM
well, it seems like
flow.executor = LocalDaskExecutor(scheduler='threads', num_workers=50)
did the trick
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.