Christopher Harris

    Christopher Harris

    2 years ago
    Does prefect support micro-batching? In more detail: We’re trying to migrate our existing generator pattern to prefect - and we’re hoping we can change to a micro-batching model. Basically, our first node in our pipeline is responsible for pulling data from a location and pushing it out to the rest of the pipeline (a DAG). We were hoping to use the LOOP construct to have that “source node” pull data in
    batch_size
    increments, and map the individual data packets across the remaining DAG. In a way this kind of seems like a “workflow loop” with the parameters for the first node constantly updating.
    Dylan

    Dylan

    2 years ago
    Hi @Christopher Harris! I want to make sure that I understand your question. Are you hoping that your first task loops forever and farms out work to the downstream tasks?
    On the whole, Prefect can definitely support ingest -> mapping pattern for small batches
    Christopher Harris

    Christopher Harris

    2 years ago
    Hey Dylan! Mostly - yes. In reality the first task would loop forever until pulling from the source yields nothing - then it would terminate.
    Dylan

    Dylan

    2 years ago
    Ahh okay cool
    So, there’s a way you could achieve this pattern right now
    With a running instance of Prefect Server or Prefect Cloud, you’d create two flows.
    Ingest Flow
    and
    Process Flow
    . Ingest flow has two tasks:
    pull and persist
    and
    create process flow run
    .
    pull and persist
    would pull the data and write it to cloud storage (GCS or S3) and returns a reference to the bucket.
    create process flow run
    would then talk to the Prefect Server/Cloud graphql api to kick off a run of
    Process Flow
    with the storage reference uri as a parameter.
    Process Flow’s run are then working in micro batches
    and your limiting factor is the amount of infrastructure you’d like to dedicate to this workflow
    Does that answer your question, @Christopher Harris?