<@ULVA73B9P> I have a list of @dataclass class ite...
# marvin-ai
j
@Marvin I have a list of @dataclass class items that I'm passing to two different tasks using .submit. will this be an issue?
m
thought for 226.5 seconds
Short answer: it depends on whether those tasks mutate the list or its dataclass items. - In Prefect 3.x, the default ThreadPoolTaskRunner runs tasks in separate threads within the same process. Passing a list of dataclass instances to two tasks via
.submit(...)
gives both tasks references to the same Python objects. If either task mutates the list or the dataclass instances, you can get race conditions or surprising results. - If both tasks only read the data, you’re fine. - If you opt into input-based caching (
cache_policy=INPUTS
), Prefect will hash/serialize the inputs. Your dataclasses must be pickleable; also avoid mutating them after submit, since the cache key is derived from their state. Ways to make this safe if mutation is involved: - Make your inputs immutable - Use
@dataclass(frozen=True)
for the items - Pass tuples instead of lists - Pass copies to each task -
copy.deepcopy(items)
when submitting - Use process isolation - Configure your flow with
ProcessPoolTaskRunner()
so each task gets its own copy via pickling (objects must be pickleable) Examples - Deep-copy per task:
Copy code
from dataclasses import dataclass
import copy
from prefect import task, flow
from prefect.task_runners import ThreadPoolTaskRunner

@dataclass
class Item:
    id: int
    value: int

@task
def t1(items):
    # safe to mutate this local copy
    items[0].value += 1
    return sum(i.value for i in items)

@task
def t2(items):
    items[0].value += 2
    return max(i.value for i in items)

@flow(task_runner=ThreadPoolTaskRunner())
def my_flow():
    items = [Item(1, 10), Item(2, 20)]
    r1 = t1.submit(copy.deepcopy(items))
    r2 = t2.submit(copy.deepcopy(items))
    return r1.result(), r2.result()
- Immutable dataclass:
Copy code
from dataclasses import dataclass

@dataclass(frozen=True)
class Item:
    id: int
    value: int
# Now tasks must not mutate items; transform by producing new objects instead.
- Process isolation:
Copy code
from prefect.task_runners import ProcessPoolTaskRunner

@flow(task_runner=ProcessPoolTaskRunner())
def my_flow():
    ...
Docs for reference: - Task runners (Thread vs Process) - Caching and input hashing Quick check: Are your tasks mutating the list or the dataclass instances? If yes, do you want those mutations to be isolated per task or shared? I can suggest the best pattern based on that.