Does prefect support micro-batching? In more deta...
# prefect-community
c
Does prefect support micro-batching? In more detail: We’re trying to migrate our existing generator pattern to prefect - and we’re hoping we can change to a micro-batching model. Basically, our first node in our pipeline is responsible for pulling data from a location and pushing it out to the rest of the pipeline (a DAG). We were hoping to use the LOOP construct to have that “source node” pull data in
batch_size
increments, and map the individual data packets across the remaining DAG. In a way this kind of seems like a “workflow loop” with the parameters for the first node constantly updating.
👀 1
d
Hi @Christopher Harris! I want to make sure that I understand your question. Are you hoping that your first task loops forever and farms out work to the downstream tasks?
On the whole, Prefect can definitely support ingest -> mapping pattern for small batches
c
Hey Dylan! Mostly - yes. In reality the first task would loop forever until pulling from the source yields nothing - then it would terminate.
👍 1
d
Ahh okay cool
So, there’s a way you could achieve this pattern right now
With a running instance of Prefect Server or Prefect Cloud, you’d create two flows.
Ingest Flow
and
Process Flow
. Ingest flow has two tasks:
pull and persist
and
create process flow run
.
pull and persist
would pull the data and write it to cloud storage (GCS or S3) and returns a reference to the bucket.
create process flow run
would then talk to the Prefect Server/Cloud graphql api to kick off a run of
Process Flow
with the storage reference uri as a parameter.
`Process Flow`’s run are then working in micro batches
and your limiting factor is the amount of infrastructure you’d like to dedicate to this workflow
Does that answer your question, @Christopher Harris?