<@ULVA73B9P> Is there a reason that adding the `@t...
# ask-marvin
a
@Marvin Is there a reason that adding the
@task
decorator to a function could dramatically increase the memory usage of a flow using the k8s worker and Prefect v3.1.1? Two runs with the same parameters, image, k8s cluster show huge differences in memory when the function is labelled as at task. Appears to not be releasing memory and will max out any allowed memory (within reason) while the non task version will complete with well under 1/3 of the limit
m
It sounds like you're encountering a bit of a puzzler with your Prefect setup! When using the
@task
decorator in Prefect v3.1.1, it appears there's a potential for increased memory usage, especially when operating with Kubernetes workers. From my search, while there isn't a specific open GitHub issue that matches your exact description, there are a few potential areas to explore based on Prefect's documentation: 1. Task Caching and Memory Use: The
@task
decorator involves task caching which can affect memory usage. When tasks cache their results, it involves some storage and memory utilization. You can read more about task caching in the Prefect documentation on task caching. 2. Worker and Infrastructure Considerations: The Kubernetes worker, by definition, involves some management overhead. Prefect workers are designed to handle task lifecycles which can mean maintaining more state information or managing caches, potentially leading to increased memory use. Explore the worker deployment considerations in more depth here. My advice would be to check if your tasks are inadvertently caching large datasets and adjust the caching policies or storage setups accordingly. Ensuring that your tasks are not persisting unnecessary data or are correctly cleaning up after execution might help mitigate memory bloat. If you'd like a bit of a giggle amidst this debugging: think of these tasks as tiny software hoarders. They just can't seem to throw anything away! Let me know if you need further assistance or a different query explored!
a
Copy code
@task(task_run_name=f"process_batch", persist_result=False)
With persist_result set to False the memory growth is still observed. The task is executed in loop (as it is for batches of 500 files) and the function itself needs to fetch those files from Google Cloud Storage. That is achieved with asyncio and a semaphore to limit to 20 concurrent downloads. Could this use of asyncio be causing Prefect to hold onto references somehow? Profiling only appears to show weak references
Copy code
data_batch = await asyncio.gather
n
hi @Andy Irvine - are you talking about memory growth in the worker pod or on flow run pods? the worker just makes api calls to spin up pods for flow runs
a
Hi @Nate The pod running the flow
n
can you give a representative example of your flow code? is it possible that you are (in your flow code) adding big objects to an aggregate object that you're keeping in memory?
a
The flow is designed to deal with hundreds of thousands of xml files (from the IRS not that it matters!). Each task takes a configurable batch and processes them to extract the required fields. Only the extracted fields are retained before they are written to storage and the next batch processed. Running it without the task decorator but still in Prefect results in the low memory usage. Just adding task decorator blows up the memory. I could share parts of it here I guess but would take quite a bit to get to a reproducer
n
is this roughy representative?
a
Untitled
@Nate this is the function where adding the
@task
decorator makes things go crazy with memory:
n
excellent thanks! if you have a chance, opening a discussion with the relevant snippets you can share would be ideal so we can get a sense of how to reproduce and perhaps profile what you're seeing
a
Will do. I am on UK time so will add it tomorrow. I was hoping v3 was going to magically fix it as had the issue on v2. One other thing that may mean more to you than me is that profiling showed element objects not being released. This is the output from objgraph (screenshot is just the top, svg has the whole thing). This is one random one
n
thank you!
a
@Nate I have created a discussion on Github here https://github.com/PrefectHQ/prefect/discussions/16069