Cody Webb
08/22/2023, 6:00 PMarg_chunks = [
args_list[i : i + chunk_size] # noqa
for i in range(0, len(args_list), chunk_size)
]
args_lst = [
(args[0], args[1])
for chunk in arg_chunks
for args in chunk
]
# since 100 args / 10 workers, my agent will spawn 10 worker sub-flows
<http://logger.info|logger.info>(f"Spawning {len(arg_chunks)} worker flows...")
await asyncio.gather(
*[
run_deployment(
# returns a FlowRun object
timeout=0, # allows non blocking
name=deploy_id,
parameters={
"project": project,
"arg_1": args[0],
"arg_2": args[1]
},
)
for args in args_lst
]
)
@flow(name="dask-flow", task_runner=DaskTaskRunner(address="<tcp://dask-scheduler:8786>"), persist_result=True, result_storage=LocalFileSystem.load("local-storage"))
async def gpu_computation(project, arg_1,arg_2):
.....
.....
project["col"] = gpu_compute(arg_1,arg_2)
project.save(...)
-----
so i have this dask flow which creates n workers based on how many chunks there are, and the flow is a long running task, that runs the args and does some gpu computation and then saves a file, the problem i am having is it is cancelling the tasks as some are long running, i dont want them to cancel, just to wait, or perhaps i should just have jobs start for first chunk and then have second chunk wait for completion of first? i dont know the best way to do this ? thoughts?Nate
08/22/2023, 6:11 PMit is cancelling the tasks as some are long runningwhat is cancelling the tasks?
Cody Webb
08/22/2023, 6:16 PMNate
08/22/2023, 6:16 PMCody Webb
08/22/2023, 6:33 PMCody Webb
08/22/2023, 6:37 PMCody Webb
08/22/2023, 6:42 PMNate
08/22/2023, 6:42 PMmost others error outdo you have the trace for one that errored out?
is there way to stagger these out or only chunk them or am i missing something?im not sure i understand, but you can use concurrency limits to enforce max parallelism
im thinking that im overloading the apiare you running against a local server or cloud? or ephemerally its worth mentioning that
run_deployment
is not a task, so the DaskTaskRunner wont change the execution of run_deployment
calls at all, unless you're calling it from a task - hard to tell from above where you're awaiting your run_deployment callsCody Webb
08/22/2023, 6:42 PMCody Webb
08/22/2023, 6:42 PMNate
08/22/2023, 6:44 PMCody Webb
08/22/2023, 6:44 PMCody Webb
08/22/2023, 6:45 PMNate
08/22/2023, 6:45 PMCody Webb
08/22/2023, 6:46 PMCody Webb
08/22/2023, 6:46 PMNate
08/22/2023, 6:47 PMlooking in the gpu serviceas a sanity check, do things work as expected if you remove dask and your GPU service from the situation?
Cody Webb
08/22/2023, 6:47 PMCody Webb
08/22/2023, 6:49 PMCody Webb
08/22/2023, 6:50 PMNate
08/22/2023, 6:53 PMgpu_computation
flow the one being kicked off with run_deployment
?Cody Webb
08/22/2023, 6:54 PMNate
08/22/2023, 6:54 PMCody Webb
08/22/2023, 6:54 PMCody Webb
08/22/2023, 6:54 PMCody Webb
08/22/2023, 6:58 PMCody Webb
08/22/2023, 7:14 PMprefect-server | 19:12:47.430 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 29.649518 seconds to run, which is longer than its loop interval of 20.0 seconds.
okay so i did a 45 run , 15 concurrency limit, would it make sense to do multiple agents with low concurency work pools?Cody Webb
08/22/2023, 7:15 PMNate
08/22/2023, 7:24 PMgpu_computation
flow, but i suspect that the weirdness is related to the dask task runner somehow - where / how are you using dask in there?Cody Webb
08/22/2023, 7:25 PMNate
08/22/2023, 7:25 PMCody Webb
08/22/2023, 7:26 PMNate
08/22/2023, 7:26 PMtask_runner
kwarg and let it use the default ConcurrentTaskRunner
Cody Webb
08/22/2023, 7:26 PMCody Webb
08/22/2023, 7:28 PMCody Webb
08/22/2023, 7:28 PMCody Webb
08/22/2023, 7:28 PMCody Webb
08/22/2023, 7:59 PMCody Webb
08/22/2023, 7:59 PMCody Webb
08/22/2023, 8:21 PMCody Webb
08/22/2023, 8:21 PMCody Webb
08/22/2023, 8:27 PMNate
08/22/2023, 8:28 PMCody Webb
08/22/2023, 8:29 PMCody Webb
08/22/2023, 8:29 PMNate
08/22/2023, 8:31 PMbut seems like theres some bottlenecks on how many tasks/requests the gpu backend/endpoints can handle concurrentlyi suppose that depends, is your gpu service still the bottleneck? if it were, im not sure extra worker/pools would help
Cody Webb
08/22/2023, 8:31 PMNate
08/22/2023, 8:31 PMCody Webb
08/22/2023, 8:31 PMCody Webb
08/22/2023, 8:31 PMCody Webb
08/22/2023, 8:32 PMCody Webb
08/22/2023, 8:53 PM