operator is also on the roadmap and is currently being designed.
t
Tim Enders
06/08/2022, 9:24 PM
OK, we are liberal users of
.map
and the
DaskExecutor
in 1.2... would you recommend trying to rewrite as async, or should I wait until
.map
is ready? Full adoption for us is waiting until it is out of beta, I am just trying to work ahead so waiting isn't a big deal.
z
Zanie
06/08/2022, 9:27 PM
I'd wait for the mapping operator if you want to spawn multiple mapped branches without blocking. Otherwise you can just use a for loop to iterate over the input value. The tasks will run concurrently without using async.
🙏 1
t
Tim Enders
06/08/2022, 9:33 PM
Hmmm.... i will probably revisit this tomorrow as my brain is done for the day.
Tim Enders
06/09/2022, 2:31 PM
OK, back on this with a fresh brain... which wants to say that the for loop is inherently syncronous so wouldn't it defeat the purpose of parallelizing with Dask? But I think I am going to try it out and see.
Tim Enders
06/09/2022, 2:48 PM
Getting this:
Copy code
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
so I must be doing something wrong. Asking in a new thread
z
Zanie
06/09/2022, 2:50 PM
The for loop is synchronous but all you’re doing is submitting task runs in it.
thor 1
Zanie
06/09/2022, 2:51 PM
That error is related to Dask multiprocessing, you need to call the flow within an if name == main block like that
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.