heyy, so I'm running a task that fetches data from a third party api, returns the response, which is...

Chris Hart

07/26/2019, 1:44 AM

heyy, so I'm running a task that fetches data from a third party api, returns the response, which is passed to the next task (to index in elasticsearch)... now I need to add pagination support in the first task, and I'm wondering if it's feasible to make the task itself repeat for however many pages are needed, so as not to blow memory or connection limits downstream.. if so could someone recommend the patterns in the docs for this?

Chris White

07/26/2019, 2:04 AM

If you could write an upstream task which somehow infers how many pages are going to be required in this particular run and produces a list where each element corresponds to a page, you could use task mapping

Chris Hart

07/26/2019, 2:49 AM

sweet thanks will check it out

Chris Hart

07/26/2019, 2:58 AM

hmm yeah I guess repeating the same task for pagination would seem to break the acyclical bit

Chris Hart

07/26/2019, 2:58 AM

probably better to just do some logic in the flow that uses the results of the task and optionally keeps calling it with optional arguments until done, allowing to spawn task #2 at each turn as well

Chris Hart

07/26/2019, 2:59 AM

unless you think the map/reduce style is somehow better for this case

Chris White

07/26/2019, 5:39 AM

Mapping would provide the benefit of treating each run of the task as a true task, with its own state (individual failures, retries, etc.). It sounds like your alternative is to include all the runtime logic for the loop in a single task. Both are perfectly valid, it just depends on whether you really want each page to be seen by Prefect as a standalone Task or whether treating the full loop as a standalone task is sufficient for your use case

🤔 1

Jeremiah

07/26/2019, 2:04 PM

I have a plan for co-opting the mapping mechanism for loops, including while loops, and If you don’t mind I’ll add your use case to the motivation — we’ll keep it on the roadmap

🙌 1

Jeremiah

07/26/2019, 2:05 PM

^ to be clear this isn’t supported today but as a feature enhancement!

6 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.