Zviri
05/07/2020, 8:47 AMCloudTaskRunner
but not the plain TaskRunner
(using Dask Deployment).
I was observing that during the "mapping" procedure the worker that was doing the actual mapping was continuously using more and more memory.
Which seemed reasonable since mapping constitutes copying the mapped task. However, I noticed that when using the CloudTaskRunner
memory consumption is much much higher during this step.
To be specific, mapping from a list that only contained approximately 8000 elements has eaten up more than 4 GB of memory on the worker.
I did some debugging and found out that the same mapped task has a serialized size of 15 200 bytes using TaskRunner
, but 122 648 bytes using the CloudTaskRunner
.
This is almost a 10 fold increase which makes the mapping function pretty unusable for me. The increased size is ultimately coming from pickling this function: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/engine/task_runner.py#L788
and I think the serialized size of the CloudTaskRunner
class is the cause of the different sizes.
Is this behavior something that is known? Or is it worth a bug report?
I will stick to the plain TaskRunner
for now which will, unfortunately, prevent me from using the cloud UI which I really like. It would be great if this could be fixed.
I'm using the latest prefect (v 0.10.7)josh
05/07/2020, 1:35 PMJim Crist-Harif
05/07/2020, 3:30 PMZviri
05/07/2020, 3:50 PMChris White
05/07/2020, 3:53 PMZviri
05/07/2020, 4:00 PM