https://prefect.io logo
Title
m

Mitchell Bregman

11/08/2019, 9:48 PM
Hi there! My team is exploring Prefect as a workflow engine to support hundreds of data integrity checks on an internal survey management system containing tens of thousands of survey responses. We like Prefect because of its clean implementation, seemingly lower learning curve, and the ability to connect complex dependencies. I am tasked with building a prototype "flow" which can serve as an example that supports thousands of API calls, data models/checks, db reads and writes. My goal is to hit the ground running with the Prefect Core framework, using a threaded environment that can schedule tasks (i.e. API calls) in parallel, read and write to PG in bulk, and perform other various tasks such as existence checking, data integrity, etc. Coming from a Luigi background, a lot of these things are taken care of for me. Our biggest pain point with Luigi is its dependency management model + rigid existence checking, which can be a huge time suck as these checks are performed on 1 thread. I am seeking scalable granularity in this workflow. As I read through these docs, I am seeing your concept of
Executors
as well as the
DaskExecutor
object - which seems to be the proper choice. Now, when I start exploring this idea of
mapping
and connecting these task dependencies together, I get a little flustered without a more complex Prefect pipeline example... If it were possible, would you be able to point me to a larger scale example on GH or elsewhere; something that has multiple modules + a nicely defined project structure?
As I scroll up and read some of the comments, https://prefect-community.slack.com/archives/CL09KU1K7/p1571954709122100 -- will you guys release your example of this soon? It would be tremendously appreciated! 🙂
j

Jenny

11/08/2019, 9:58 PM
Hey Mitchell. Thanks for the question. Let me check with the team and see what we can share.
:marvin: 1
m

Mitchell Bregman

11/08/2019, 10:00 PM
Thanks!
c

Chris White

11/08/2019, 10:03 PM
Hey @Mitchell Bregman - great question; we’re definitely trying to work on getting some case studies out there, but in the meantime this flow maps over hundreds of URLs and uses a local version of dask: https://docs.prefect.io/core/tutorials/advanced-mapping.html
🙌 1
m

Mitchell Bregman

11/08/2019, 10:06 PM
It says Don't Panic, so I won't! Buckling down this weekend to check this out; if any examples/case studies come out in the near future it'd be awesome if you can relay. Appreciate it @Chris White!
💯 2
c

Chris White

11/08/2019, 10:08 PM
I’ll be sure to do that - let us know if anything else comes up as you work through your use case!
👍 1