Is it possible to yield results from a task into another task? I have a task that pulls data from an API in pages, and rather than accumulating ALL of the data into a list before passing it to the next task, it would be much more memory efficient to yield each page at a time.
✅ 1
k
Kevin Kho
07/06/2022, 4:15 AM
You can’t
yield
a task output, or it doesn’t really help because results are held in memory. I assume you know of mapping but it doesn’t fit your use case either right?
Kevin Kho
07/06/2022, 4:15 AM
Or could you give me a rough example in normal Python so I can see the idea?
j
Jeff Kehler
07/06/2022, 4:26 AM
I don't think mapping could be used here. Here is some pseudo code
Copy code
@task
def get_api_data():
items = []
for r in stripe.BalanceTransaction.list(limit=100).auto_paging_iter():
items.append(r)
return items
@task
def insert_to_db(records):
# logic to insert into db here
with Flow("test") as flow:
items = get_api_data()
insert_to_db(items)
Lets pretend that
get_api_data
returns 10's of thousands of records. All of this would have to be held in memory before passing it to the insert into db task.
Jeff Kehler
07/06/2022, 4:32 AM
The only way I know to handle this would be to do both operations in a single task rather than splitting them into 2 tasks
k
Kevin Kho
07/06/2022, 4:45 AM
Yeah that’s right. This can’t be done. If might be friendlier in 2.0 though because you can do a for loop and run those of these tasks in the for loop with different offsets
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.