Would appreciate anyone’s insight as to whether this is doable/efficient.
k
Kevin Kho
04/26/2021, 5:20 AM
Hi @Marz! Are you using Beam for streaming?
m
Marz
04/26/2021, 12:58 PM
Hi @Kevin Kho, we are evaluating Beam->Flink for batch&streaming. At a high level, we want to receive live client events via kafka (or another client facing micoservice), from which Beam/Flink feeds on and processes requests in small, modular tasks. We are interested in using a workflow manager to let us piece all these tasks together.
k
Kevin Kho
04/26/2021, 1:26 PM
Do you have an idea where Prefect fits in this setup? Or you’re just asking for input?
m
Marz
04/26/2021, 1:48 PM
Mostly for input as I haven’t found any resources on this.
Essentially, I’d like to know if Prefect can be used as the mother pipeline (Main DAG), letting us define and visualize different jobs. The UI would be a big factor in letting us know if any tasks have failed, if so at which stage. (Error management and traceability).
k
Kevin Kho
04/26/2021, 2:23 PM
It makes more sense for batch jobs, but less for streaming. Like what Jim said in that thread you linked, Prefect running Beam makes sense but not the other way around. I don’t know if Prefect will be able to monitor your stream because in order to do so, you’d need to connect it to the Beam stream and have it orchestrate the small, modular tasks. This might work, but I can’t imagine it being optimal. If you have a business use case for a stream, I think it tends to need low latency and adding a tool like Prefect adds overhead. Does that make sense?
m
Marz
04/26/2021, 2:33 PM
Yup that makes sense. Was thinking of Prefect running Beam and other stages but the latency overhead doesn’t suit our business use case. Thank you! If it’s okay, I will get back to this thread if I get a better idea of our architecture plan.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.