https://prefect.io logo
t

Tomer Cagan

03/02/2022, 12:25 PM
Hi, Is there any best practices in case I want to run a library that has it's own parallelism? For example, in our code base we sometimes use solvers (e.g. https://github.com/coin-or/pulp, or https://developers.google.com/optimization/introduction/overview) which brings their own parallelism...
a

Anna Geller

03/02/2022, 12:38 PM
I know OR-tools, I used it once myself - super useful indeed! With basically no knowledge about how those tools work behind the scenes, I would assume that if you use
LocalExecutor
in Prefect 1.0 or
SequentialTaskRunner
for Prefect 2.0, then you wouldn't rely on Prefect-specific parallelism but rather the parallelism provided by your tool depending on how you call it in your tasks and flows. Having said that, Prefect is open-source so nothing stops you from building your own custom executors. If you would want to take a stab at it, you would essentially need a class that satisfies the Executor interface (in 1.0) including those functions: • submit • map • wait similarly to Python's
concurrent.futures
executor's interface.
t

Tomer Cagan

03/02/2022, 1:43 PM
I would need it to run as part of a larger flow (algorithm) and my intention would be to use it in a k8s environment. When I start a flow, all the tasks within this flow would always run on the same resources or is it possible to separate where they are running? (I know it is possible to separate at the flow level, but was wondering about tasks)
a

Anna Geller

03/02/2022, 1:55 PM
Great question. So this is exactly what the executor and in 2.0 task runner are for - they decide WHERE and HOW to execute your task runs. You can also probably see now why in Prefect 2.0 the executors have been renamed to task runners 🙂 To explain it a bit more, so far there are (in Orion): •
SequentialTaskRunner
- to run things sequentially in the same process, •
ConcurrentTaskRunner
- to run things in parallel using async, •
DaskTaskRunner
- to offload the task run execution to a Dask cluster - this probably answers your question best, since this Dask cluster could be executed e.g. on a Kubernetes cluster essentially separating task run execution across a cluster of resources/nodes Executors in Prefect 1.0 are analogous.
k

Kevin Kho

03/02/2022, 2:18 PM
I am also familiar with coin-or. If you have multiple coin-or jobs, or anything that tries to run in parallel (RandomForest, Gradient Boosted Trees), it’s bad to have both parallelism on that level, and then on the Dask level (Prefect mapping). I’m sure you know that which is why you are asking the question. Resource contention can lead to deadlocks. For your use case where you have one step that has parallelism, the concern is starting other tasks that require parallelism. I think there is a hack you can do where you treat the COIN-OR task as a mapped task with one element because Prefect has a limitation that it can only run one mapped task at a time so if it’s mapping, it’s the only process and you won’t have any resource contention.
upvote 2
3 Views