< Marvin> When I run a flow with a worker set in a process t Prefect Community #ask-marvin

<@ULVA73B9P> When I run a flow with a worker set i...

Chen

12/05/2023, 8:52 PM

@Marvin When I run a flow with a worker set in a process type. How do the tasks exchange data between them? Is it Cloudpickle? Is it transferred in memory or will it be dumped to disk?

Copy code

python

@task(task_run_name="PERSITED", persist_result=True)
def my_task():
    df = pd.DataFrame(dict(a=[2, 3], b=[4, 5]))
    return df


@task(task_run_name="NOT PERSITED")
def my_task2():
    df = pd.DataFrame(dict(c=[4, 6], d=[8, 10]))
    return df


@task()
def dummy_task(value):
    print(f"value: {value}")


@flow(log_prints=True)
def my_flow():
    res = my_task()
    res2 = my_task2()
    dummy_task(res)
    dummy_task(res2)

Also, does persist_result change the behavior of how this data is being transferred? Does it change if I use different TaskRunner?

Marvin

12/05/2023, 8:53 PM

In Prefect, tasks exchange data between them using a mechanism called "Results". When a task is run, its return value is wrapped in a

Result

object. This object is then passed to the next task in the flow. The actual data is not transferred directly between tasks. Instead, the

Result

object contains a reference to the data, which can be retrieved when needed. The

persist_result

parameter in your task decorator affects how this data is stored. If

persist_result

is set to

True

, the task's result will be persisted to a storage location in your infrastructure, and Prefect will store a reference to this result. This is useful in cases where you want to cache the task's result or if the result is too large to be held in memory. As for the TaskRunner, it is a component in Prefect that is responsible for executing a task and managing its state. The TaskRunner is not directly involved in the data transfer between tasks. However, it can affect the execution environment of the task, which might indirectly affect how the data is handled. If you use a different TaskRunner, it might change the execution environment of your tasks, but it won't change the way data is transferred between tasks. The data transfer mechanism is a core feature of Prefect and is not tied to a specific TaskRunner. Please note that this is a high-level explanation and the actual implementation might be more complex. For more detailed information, you can refer to the Prefect documentation on Results, Task Runners, and Dask & Ray Task Runners.

2 Views

Open in Slack

Previous Next