https://prefect.io logo
Title
s

Sam Werbalowsky

12/21/2022, 10:24 PM
Is it correct that in order to run tasks concurrently we need to us
submit
? Even if the flow is written in a way that there is no dependencies between tasks?
1
z

Zanie

12/21/2022, 10:29 PM
Yes. Initially we had this implemented so there was always automatic concurrency, but this confused people and forced users who only wanted to run things sequentially to work with futures.
s

Sam Werbalowsky

12/21/2022, 10:31 PM
Wow. I had always thought one of the advantages of an orchestration system was to handle concurrency and sequencing for you, I’m very surprised to read that. I really do not want to go to my team and tell them they now need to adopt extra code and concepts to control if they want concurrent jobs vs sequential if we adopt Prefect.
Thanks for the response here as well - these seem like 2 misses from Prefect 1 to me. Version 2 seems really bare in some ways compared to 1.
z

Zanie

12/21/2022, 10:38 PM
We handle the concurrency for you, in that we’ll orchestrate the tasks and manage submission to other infrastructure — this would otherwise require a lot of boilerplate on your behalf. However, Prefect is more than an orchestrator. For example, we are also focused on observability. We want it to be easy to add decorators to your code to get observability, but if we’re automatically sending everything to threads or a remote system to run concurrently you have to deal with a lot of complex topics immediately to begin using any of Prefect’s functionality.
s

Sam Werbalowsky

12/21/2022, 10:44 PM
Well - in V1 (correct me if I’m wrong) you could change the executor to say a Dask executor and it would then run concurrently. Sure, you probably had to have some logic to control that, but whoever was maintaining Prefect could handle that. The developer didn’t have to worry about how/when their tasks would run beyond simple dependencies. That is gone in V2, and now each developer has to worry about concurrency or sequential execution. Shouldn’t the infrastructure block influence concurrency to some degree?
I mean…I wouldn’t even had known concurrency wasn’t happening unless I stumbled upon something here
z

Zanie

12/21/2022, 10:47 PM
In v2, you can run some tasks concurrently and others sequentially within a single flow.
In part, this is to simplify working with task results and exceptions within the flow. Since the flow is defined dynamically now, you can perform arbitrary logic with the results or states of tasks.
👍 1
For beginners, it’s complicated to deal with the concept of futures immediately and changing the return type of all of the task calls based on the task runner you select is too confusing.
I definitely was thinking the same way as you at first 🙂 I wrote the v2 implementation that automatically ran things concurrently and I was the one who changed it to sequential by default.
s

Sam Werbalowsky

12/21/2022, 10:54 PM
skeptical
z

Zanie

12/21/2022, 10:54 PM
We saw a lot of confusion during our prereleases 😕
s

Sam Werbalowsky

12/21/2022, 10:54 PM
We’ll see how this goes then …..
z

Zanie

12/21/2022, 10:55 PM
We could consider adding a setting to toggle calls to always return futures (i.e. always submit)
It sounds kind of ugly, but it’d be pretty simple separate from the implications.
You could also subclass
Task
and override
__call__
to do that, I think?
👀 1
s

Sam Werbalowsky

12/21/2022, 10:57 PM
yeah … I mean there’s a lot to say about how people “expect” things to run - we’re POCing orchestrators and obviously team experience is going to play into expectations. My goal as a maintainer is to make stuff easy for my developers - so if there’s friction for concurrency/parallelism or anything that we’d “expect” it’s going to be a tough sell.
Like I know we have jobs that take a bunch of inputs and then want to run the same function over those inputs, but i’ll play around with Prefect more for it….it seems like
submit
with a list makes a bit more intuitive sense (here), but if we’re just running a flow that wants to run a bunch of steps that aren’t related it seems like something we need to explicitly tell everyone….hopefully that makes sense
z

Zanie

12/21/2022, 11:01 PM
Yeah I get it 🙂
Funny enough, we’ve been getting complaints that we aren’t showing a relationship in the DAG when tasks calls happen serially in a flow without passing data around…
s

Sam Werbalowsky

12/21/2022, 11:03 PM
in the radar?
z

Zanie

12/21/2022, 11:03 PM
Yeah
Our more naive users basically are used to sequential programming and misuse of global context and side-effects. It sounds like you’re more advanced, which means that the interface might be overly friendly for you but you’ve also got the ability to tweak what’s going on.
s

Sam Werbalowsky

12/21/2022, 11:05 PM
Oh, I just have enough experience to be dangerous and complain.
z

Zanie

12/21/2022, 11:05 PM
Let me know where you land, I can help with a custom
@task
decorator that would always submit if you get stuck.
👍 1
s

Sam Werbalowsky

12/21/2022, 11:05 PM
That radar is confusing though.
z

Zanie

12/21/2022, 11:05 PM
It’s a top priority to resolve that! We’ve got a lot of great engineers working on improvements / alternatives.
s

Sam Werbalowsky

12/21/2022, 11:06 PM
Appreciate that - since we’re just doing a POC, I’m taking notes on some of the quirks of the systems but coming back to ask about some options will definitely be useful…It took a ton of time to implement that sort of stuff with v1, too.
p

Peyton Runyan

12/23/2022, 1:11 PM
Like I know we have jobs that take a bunch of inputs and then want to run the same function over those inputs,
We have map for that: https://docs.prefect.io/concepts/tasks/#map