https://prefect.io logo
Title
b

Broder Peters

02/06/2023, 11:38 AM
Hello 😄 I'm currently trying to understand the concept of infrastructure and parallel task runners. My goal is to run potentially high cpu and memory using tasks under a subflow in parallel and in independent "containers/environments". My assumption was that I could e.g. specify to have each task be running in it's own docker container by using the docker infrastructure. But my current results always end up to be running the whole flow in one container. Am I missing something or is this just not in the concept of prefect? Example code but build with
prefect deployment build my_flow.py:do_parallel_stuff -n docker -ib docker-container/docker -sb s3/some-bucket
@flow(task_runner=DaskTaskRunner)
def do_parallel_stuff():
    for x in y:
        result = do_heavy_stuff_in_own_environment.submit(x)

@task()
def do_heavy_stuff_in_own_environment(x: str):
    # Does heavy stuff
Thanks in advance!
t

talat

02/06/2023, 4:55 PM
hey broder, we’re in a somewhat similar situation. our current approach is to have to separate flows as deployments, say parent-flow and child-flow. now we start runnng the parent-flow which does some checks and then, in a task, calls
run_deployment
on the child-flow X number of times. since the child-flow is a separate deployment, it can have its own infrastructure and storage to run on. hth
b

Broder Peters

02/07/2023, 6:49 AM
Interesting! Thanks for sharing! Would a "native" solution be to spin up a dask cluster to separate the parallel execution into? Like having one in kubernetes or a cloud provider would potentially spin up pods per thread?
t

talat

02/07/2023, 8:24 AM
I can’t really say what would be a native solution here. But in our case, we’re currently using a concurrent task runner on the task which triggers the child flow deployments. all of this is in kubernetes and child flow deployments are executed in separate pods. I havent tested a dask cluster in this scenario yet