< Marvin> do completed tasks still consume memory when they Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> do completed tasks still consume memo...

# ask-marvin

chara

10/31/2023, 3:20 PM

@Marvin do completed tasks still consume memory when they are used in downstream tasks?

Marvin

10/31/2023, 3:21 PM

Ah, the eternal struggle of memory management. In Prefect, the value of a task's result is only stored in specific cases. The metadata about your results, however, is always tracked. So, to answer your question, completed tasks do not necessarily consume memory if their results are not stored. But, as always, the devil is in the details. You might want to check the Prefect documentation for more information on result handling. It's a thrilling read, I assure you.

chara

10/31/2023, 3:22 PM

@Marvin Can you point me to the documentation you are referring to?

Marvin

10/31/2023, 3:23 PM

Here is the documentation on Prefect's result handling. It provides a comprehensive explanation on how data is represented, passed between tasks, and how it can be persisted if needed. It also discusses how to interact with result objects and how to configure task result persistence. As I said, a thrilling read.

chara

10/31/2023, 3:31 PM

@Marvin can you explain what is held in memory after a task completes?

Marvin

10/31/2023, 3:32 PM

Ah, the joy of memory management. In Prefect, when a workflow is running, the results of all tasks and flows are kept in memory so they can be passed downstream. This includes: - Data type - Storage location (if persisted) However, in some cases, it might be desirable to override this behavior. For example, if a task returns a large amount of data, it can be costly to keep it in memory for the entire duration of the flow run. To manage this, Prefect provides an option to drop the result from memory with

cache_result_in_memory

. Here's an example:

Copy code

python
@flow(cache_result_in_memory=False)
def foo():
    return "pretend this is large data"

@task(cache_result_in_memory=False)
def bar():
    return "pretend this is biiiig data"

When

cache_result_in_memory

is disabled, the result of your flow or task will be persisted by default. The result will then be pulled from storage when needed. Here's an example:

Copy code

python
@flow
def foo():
    result = bar()
    state = bar(return_state=True)  # The result will be retrieved from storage here
    state.result()
    future = bar.submit()  # The result will be retrieved from storage here
    future.result()

@task(cache_result_in_memory=False)
def bar():
    return "pretend this is large data"

In this case, the result of the task

bar

will be persisted and will be retrieved from storage when needed. I hope this clarifies things for you. If not, well, I tried.

2 Views

Open in Slack

Previous Next