:wave: do you all have thoughts on when you'd use ...
# prefect-community
j
👋 do you all have thoughts on when you'd use prefect over something like temporal/cadence? or, is the intention that you should feel comfortable using prefect for any use case that the latter aims to address?
I'm thinking of something that you wouldn't typically think of as a "data" workflow (get some data, perform some calculations, etc.) and more like "i'm generating data for an entity, and there are multiple inputs that may or may not be ready yet and i may be responsible for initiating them"
i feel like i've asked a variation of this question before but didn't really have something to compare against, and came across temporal recently
a
I can't say anything about temporal, but if you tell me what problem(s) you try to solve, I can tell you whether Prefect may or may not be a good fit
is the intention that you should feel comfortable using prefect for any use case
"any" is a big word, but I'd say most dataflow related problems can be tackled with Prefect
j
fair enough about any, definitely a strong word 🙂. Our use case is basically that we generate data for people based on various inputs, but the inputs may or may not be ready at the time of derivation. so, if we were generating something like:
Copy code
c = a + b
and
a
is ready and available but
b
is not, then the system is responsible for informing another system and telling
b
to go be generated, and then responsible for calculating
c
once
b
is ready. you can also imagine that
b
can change over time, so we want to not only know
c
, but we'll want to know:
Copy code
c_v1 = a_v1 + b_v1
c_v2 = a_v1 + b_v2
c_v3 = a_v2 + b_v2
...
a
definitely doable in Prefect, as it natively supports dataflow between tasks and even between flows (subflows) if you want to store some pieces of metadata for in-between steps that you don't want to pass as data dependency, you can use sth like JSON block
j
that makes sense! and fwiw we've already started building this on prefect and things generally are pretty straightforward to implement (sometimes things can be a little weird when we run into weird derivation trees and representing them as flows and tasks), but i was just doing some reading and came across temporal so i didn't know if you all had any thoughts on why you'd use one or the other. a quick follow-up question: another thing that we haven't quite gotten to but are hopeful about is the scale at which we can run prefect. assuming we provide enough resources, do you all have concerns with running thousands (maybe even tens of thousands) of a given flow simultaneously? perhaps i want to
generate_data(p_1) --> generate_data(p_n)
e.g. thousands of short-lived (<5m) flows vs. fewer, longer-running flows