Hello I’m trying a simple flow of doing linear reg...
# prefect-community
a
Hello I’m trying a simple flow of doing linear regression in batches. The flow works when doing it sequentially but when I try with Dask backend it causes memory problems. What is confusing is that there is ample memory per worker. Can someone help me identify the problem? Am I doing something of a Dask anti-pattern somewhere?
j
Hi An, A few questions: • Are you running the above as a script (
python your_code.py
)? • What OS are you on? • What version of Python are you using? • How large is the input data approximately? How much RAM is available for the workers? • Can you describe a bit more about how it fails?
d
One common gotcha with pandas and dask is the
"mode.chained_assignment"
option of pandas. By default it uses a lot of memory but it can be changed for a reduced footprint
☝️ 2