Does prefect support micro-batching? In more deta...
# prefect-community
Does prefect support micro-batching? In more detail: We’re trying to migrate our existing generator pattern to prefect - and we’re hoping we can change to a micro-batching model. Basically, our first node in our pipeline is responsible for pulling data from a location and pushing it out to the rest of the pipeline (a DAG). We were hoping to use the LOOP construct to have that “source node” pull data in
increments, and map the individual data packets across the remaining DAG. In a way this kind of seems like a “workflow loop” with the parameters for the first node constantly updating.
👀 1
Hi @Christopher Harris! I want to make sure that I understand your question. Are you hoping that your first task loops forever and farms out work to the downstream tasks?
On the whole, Prefect can definitely support ingest -> mapping pattern for small batches
Hey Dylan! Mostly - yes. In reality the first task would loop forever until pulling from the source yields nothing - then it would terminate.
👍 1
Ahh okay cool
So, there’s a way you could achieve this pattern right now
With a running instance of Prefect Server or Prefect Cloud, you’d create two flows.
Ingest Flow
Process Flow
. Ingest flow has two tasks:
pull and persist
create process flow run
pull and persist
would pull the data and write it to cloud storage (GCS or S3) and returns a reference to the bucket.
create process flow run
would then talk to the Prefect Server/Cloud graphql api to kick off a run of
Process Flow
with the storage reference uri as a parameter.
`Process Flow`’s run are then working in micro batches
and your limiting factor is the amount of infrastructure you’d like to dedicate to this workflow
Does that answer your question, @Christopher Harris?