• m

    Mitchell Bregman

    2 years ago
    Hi there! My team is exploring Prefect as a workflow engine to support hundreds of data integrity checks on an internal survey management system containing tens of thousands of survey responses. We like Prefect because of its clean implementation, seemingly lower learning curve, and the ability to connect complex dependencies. I am tasked with building a prototype "flow" which can serve as an example that supports thousands of API calls, data models/checks, db reads and writes. My goal is to hit the ground running with the Prefect Core framework, using a threaded environment that can schedule tasks (i.e. API calls) in parallel, read and write to PG in bulk, and perform other various tasks such as existence checking, data integrity, etc. Coming from a Luigi background, a lot of these things are taken care of for me. Our biggest pain point with Luigi is its dependency management model + rigid existence checking, which can be a huge time suck as these checks are performed on 1 thread. I am seeking scalable granularity in this workflow. As I read through these docs, I am seeing your concept of
    Executors
    as well as the
    DaskExecutor
    object - which seems to be the proper choice. Now, when I start exploring this idea of
    mapping
    and connecting these task dependencies together, I get a little flustered without a more complex Prefect pipeline example... If it were possible, would you be able to point me to a larger scale example on GH or elsewhere; something that has multiple modules + a nicely defined project structure?
    m
    Jenny
    +1
    6 replies
    Copy to Clipboard
  • Brad

    Brad

    2 years ago
    Hi team, I just got onto prefect cloud and tried installing the prefect cli via pipx (https://github.com/pipxproject/pipx) but the install didn’t pick up the
    click
    requirement from dask
    Brad
    3 replies
    Copy to Clipboard
  • Braun Reyes

    Braun Reyes

    2 years ago
    omg! just touched the Dask project for the first time...its pretty darn cool.
  • Braun Reyes

    Braun Reyes

    2 years ago
    that delayed object usage looks awfully familiar 🤔
  • b

    Brett Naul

    2 years ago
    bit of a philosophical q: why are Parameters required to be unique rather than just all receiving the input? I mentioned to @Chris White that I need to duplicate a large part of one flow in multiple places, so I was toying with a pattern like
    def add(x, y):
        return x + y
    
    def add_to_x(y):
        x = Parameter('x')
        total = task(add)(x, y)
    
    with Flow('flow') as f:
        add_to_x(1)
        add_to_x(2)
    in this case you could just pass in
    x
    to the “factory” function instead, but in practice I have lots of parameters so it feels a bit clumsy. I’m sure there was a good reason for enforcing that parameters be unique; but doesn’t the fact that we’re able to assert that it’s unique mean that we could also just grab a reference to the already-existing task? 🤔
    b
    Chris White
    3 replies
    Copy to Clipboard
  • p

    Phoebe Bright

    2 years ago
    Hi, I've just started using Prefect with Django and I have a couple of questions 1. Is there some way of running a flow that includes a Django context so I can access the model methods? I can use API calls instead. Is this a better way to do it? 2. How to I trigger the correct workflow? The overall flow is like this: Check to see if any new records have been added to a Django model/postgres table For each new record, work out what type it is and therefore which workflow to run, eg if type == 1 flow1.run() @taska @taskb @taskc elif type == 2 flow2.run() @taska @taskz What is the best approach?- Should this all be in Prefect - not sure how to do that - Should I have a piece of external code that checks for new records and what type they are and then calls the correct prefect flow with the parameter of the new record - Is there a better way of doing this!?
    p
    Jeremiah
    6 replies
    Copy to Clipboard
  • j

    Jason

    2 years ago
    Does prefect support a yield generator model of passing data from 1 task to the next?
  • j

    Jason

    2 years ago
    For instance, when processing a CSV, can you yield per row?
    j
    Jeremiah
    +1
    3 replies
    Copy to Clipboard
  • m

    Mitchell Bregman

    2 years ago
    hey guys, just out of curiosity... in
    get_module_metadata
    i am returning a class object; is this the reason for the doubly directed dependency arrow? it seems as though in all other tasks, where I am returning standard python data types, do not have a doubly...
    m
    j
    +1
    7 replies
    Copy to Clipboard