https://prefect.io logo
Title
s

Steve R

03/24/2022, 3:21 PM
Hi prefecters, I have a question about prefect core (no Prefect Server, no Prefect Cloud). I'm trying to understand Flow concurrency. Is the best way to handle a stream of input data to run multiple flows (in their own thread) as the data comes in?
alternatively, I created a list of dask distributed events that can be waited on, but I'm not crazy about this solution.
So running multiple flows backed by the same executor is an alternative option.
k

Kevin Kho

03/24/2022, 3:24 PM
I think this is pretty hard without a backend because the standard way to spin up a flow is using the
create_flow_run
task, which triggers a new flow run from the backend. I don’t even know if you can achieve concurrent flows using core. I think the thread gets occupied with one flow and might not kick off other ones. The Prefect agent than connects to the backend handles this for you. I think this is a scenario where you may have to use Dask to orchestrate Prefect like you are suggesting…but then I’m wondering what Prefect gives you here?
s

Steve R

03/24/2022, 3:28 PM
Thanks! I'll respond in a bit.
Concurrent flow runs are not supported by
flow.run()
flow.run()
is a convenient way to run a flow on schedule, but it does not support concurrent flow runs. It will wait for a run to completely finish, including things like tasks that require retries, before starting the next run. However, Prefect schedules never return start times in the past. This means that if a flow run is still running when another flow run is supposed to start, the second flow run won't happen at all. If you require concurrent runs in a local process, consider using the lower-level
FlowRunner
classes directly.
from https://docs.prefect.io/core/concepts/flows.html#running-a-flow-on-schedule
k

Kevin Kho

03/24/2022, 3:34 PM
So Core alone is not intended for production, but if you use it with Cloud or Server, then you can run concurrently because it’s not the same
flow.run()
s

Steve R

03/24/2022, 4:40 PM
Thanks Kevin, Prefect Core is attractive as an richer object model abstraction over Dask on Windows. It has paid support options and is "cloud ready" as that need may arise.
I was mostly interested in this line:
If you require concurrent runs in a local process, consider using the lower-level
FlowRunner
classes directly.
k

Kevin Kho

03/24/2022, 4:46 PM
I have never seen anyone do that, and I don’t think i’d recommended working with that as working with the FlowRunner is pretty hard to read
1
s

Steve R

03/24/2022, 5:11 PM
so it seems like structuring our problem in a way that operations are performed within a single flow.run if using only prefect core is the way to go
so like input could be a list of URLs to process even if some of the URLs will not return until the data is available
because a prefect flow likes a "list of work" as input
k

Kevin Kho

03/24/2022, 5:14 PM
I think you tagged a different Kevin but no worries. That will work and mapping will help you. If the data is not ready, you can poll inside the task or just set a high number of max_retries?
k

Kevin Kho

03/24/2022, 5:20 PM
No no. Not in the pipe lol. Those are old. But either way these would require a backend beyond Prefect Core
This is our paradigm for event based flows
1
what is your picture? looks like masks over nuclei?
s

Steve R

03/24/2022, 5:44 PM
whole cell segmentation l think labelling live/dead cells
we do the things like that: https://www.moleculardevices.com/
k

Kevin Kho

03/24/2022, 5:46 PM
ahh I see