William Jamir
12/02/2023, 7:44 PMKevin Grismore
12/02/2023, 8:06 PM.map
. Are you looking to gain performance by completing multiple simultaneous computations, or looking to take advantage of time spent otherwise waiting on IO like network transfers and reads/writes? Both is a valid answer too, but your needs will have a large influence over design choices here.William Jamir
12/02/2023, 10:48 PMpossibilities_a = [x,y]
possibilities_b = [1,2,3]
for my_parameters in itertools.product(l1, l2)
result = feature_enginering_flow(my_parameters)
training(result)
And I want to run this thing simultaneously (feature_engine in with x,1 and the other one with feature engineer x,2 and so for it).Kevin Grismore
12/02/2023, 10:52 PMWilliam Jamir
12/02/2023, 11:19 PMWilliam Jamir
12/02/2023, 11:19 PM"one worker that distributes flow runs"Thank you for bringing up that point. Just to clarify, when you mentioned "one worker that distributes flow runs," you were referring to the Docker/Kubernetes work type? In this case, when a flow is executed, the image will be executed as resources become available, right? So I'm safe to assume that the asyncio.TaskGroup, will run it independently, which is interesting. But how the logs would work in this scenario? I would need to "hunt" warnings and other types of logs from the sub flows as the current approach, right? I mean it will not show logs like when I use task.map
Kevin Grismore
12/02/2023, 11:57 PMIf I have a Flow A running on Worker/Agent A with a work_type of 'process', and Flow A calls sub-flow B using asyncio.TaskGroup, sub-flow B will execute on the same process/Worker/Agent A. Is that correct?This is correct.
However, if I use Task.map, the next available agent/worker will be picked up. Is that also correct?No, map submits task runs simultaneously inside the current execution environment using the selected TaskRunner. From the
map
docstring:
Will create as many task runs as the length of the iterable(s) in the backing API and submit the task runs to the flow's task runner.
So mapped tasks still exist only within the context of the current flow run.Kevin Grismore
12/03/2023, 12:06 AMrun_deployment
inside the TaskGroup
. Only then will each subflow get its own compute resources to run on. An alternative to this is to use the Dask task runner, but Dask has its own learning curve and is a somewhat different approach to distributed compute than the one we've been talking about so far.
> But how the logs would work in this scenario? I would need to "hunt" warnings and other types of logs from the sub flows as the current approach, right?
Yeah, the deployments-as-subflows model still manifests as separate flow runs in the UI. We know this isn't ideal for tracking down errors at times, and your feedback on that is something we're thinking about.Kevin Grismore
12/03/2023, 12:06 AMWilliam Jamir
12/03/2023, 9:40 AMWilliam Jamir
12/04/2023, 9:42 AMKevin Grismore
12/04/2023, 8:08 PM