heyy, so I'm running a task that fetches data from a third party api, returns the response, which is passed to the next task (to index in elasticsearch)... now I need to add pagination support in the first task, and I'm wondering if it's feasible to make the task itself repeat for however many pages are needed, so as not to blow memory or connection limits downstream.. if so could someone recommend the patterns in the docs for this?
c
Chris White
07/26/2019, 2:04 AM
If you could write an upstream task which somehow infers how many pages are going to be required in this particular run and produces a list where each element corresponds to a page, you could use task mapping
c
Chris Hart
07/26/2019, 2:49 AM
sweet thanks will check it out
Chris Hart
07/26/2019, 2:58 AM
hmm yeah I guess repeating the same task for pagination would seem to break the acyclical bit
Chris Hart
07/26/2019, 2:58 AM
probably better to just do some logic in the flow that uses the results of the task and optionally keeps calling it with optional arguments until done, allowing to spawn task #2 at each turn as well
Chris Hart
07/26/2019, 2:59 AM
unless you think the map/reduce style is somehow better for this case
c
Chris White
07/26/2019, 5:39 AM
Mapping would provide the benefit of treating each run of the task as a true task, with its own state (individual failures, retries, etc.). It sounds like your alternative is to include all the runtime logic for the loop in a single task. Both are perfectly valid, it just depends on whether you really want each page to be seen by Prefect as a standalone Task or whether treating the full loop as a standalone task is sufficient for your use case
🤔 1
j
Jeremiah
07/26/2019, 2:04 PM
I have a plan for co-opting the mapping mechanism for loops, including while loops, and If you don’t mind I’ll add your use case to the motivation — we’ll keep it on the roadmap
🙌 1
Jeremiah
07/26/2019, 2:05 PM
^ to be clear this isn’t supported today but as a feature enhancement!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.