Hi all! I have a task that calls a SQL query and returns a list, and I want to iterate over that list and pass even chunks of the list to a scrapy spider that is called in another task and wait between each chunk. However, I'm running into an issue with LOOP where it is only passing the last chunk of the list to the scrapy spider.
Copy code
@task
def query_that_will_return_a_list(): -> list
@task
def scrapy_api_call_chunks(title_list):
loop_payload = prefect.context.get("task_loop_count", 0)
title_list_grouper = list(grouper(title_list, 10))
if loop_payload <= len(title_list_grouper):
# Each loop will be an iteration of 10 titles. # of loops * 10 will result in the total number of titles looped over so far
raise LOOP(message = 'Running the next 10 items in job titles list')
scraper_class = Scraper()
scraper_class.instantiate_web_scraper(title_list_grouper[loop_payload - 2])
I feel like I don't fully understand how to utilize LOOP in the context of passing information to another function inside the task.
k
Kevin Kho
10/26/2021, 7:06 PM
Can I see your
grouper
code?
Kevin Kho
10/26/2021, 7:11 PM
So the way the loop works, when you
raise LOOP
, you pass the modified data to the next loop as the input. Have you seen the example here ?
d
Dominic Pham
10/26/2021, 7:11 PM
Hi Kevin,
grouper
is a more-itertools recipe.
Copy code
def grouper(iterable, n, fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
k
Kevin Kho
10/26/2021, 7:14 PM
Ah ok. I think the issue is indeed just understanding the LOOP more. You need to pass the result like the example
Ah okay, I think I understand. Is there a way to see what is being passed between each loop?
k
Kevin Kho
10/26/2021, 7:59 PM
You log it inside the task so it gets printed in the logs every loop
d
Dominic Pham
10/26/2021, 8:20 PM
Apologies, is there an example of what that might look like involving a LOOP? I've looked over the documentation on Logging but I can't quite wrap my head around it
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.