Daniel Rothenberg
12/03/2020, 3:41 PMdask-gateway
to spin up distributed workflows. Major kudos to the team!
I have a question though about scaling out this workflow. In my particular application, I basically need to apply this training/prediction workflow for O(10,000) different "assets". Put another way - my workflows each have a Parameter
called "asset". I see two different ways of scaling my toy with one asset to the much larger set of a few thousand:
1. Use the .map()
capabilities on my tasks and simply "map" a list of "assets" for the flow to run through; or
2. Convert my "asset" Parameter
to be a list of all the assets, and do all the mapping inside each of my tasks in the workflow - by leaning on the fact that each task already has access to the dask cluster I'm running the workflow on
I'm not sure which is the "recommended" approach here. The training/modeling flow generates, for each asset, artifacts on the order of 1MB or so that have to get passed around, but I'm worried about the performance of Prefect's scheduler if it needs to map to tasks over a list of several thousand items. On the other hand, things get... complicated (with my naive workflow for saving trained models and whatnot)... if I have to manually do all the book-keeping inside each task in the flow.
Thoughts? Does anyone have experience scaling out ML training/prediction workflows on Prefect that can offer some insight?Dylan
map
to map over a list of assets within a single Flow run will work. As long as the k8s job has the resources available, the Flow Run can orchestrate thousands of mapped Task Runs. Unlike some other workflow tools, the centralized Prefect Scheduler (in your Prefect Server instance or Prefect Cloud) is only responsible for scheduling Flow Runs. The Flow Run itself is responsible for scheduling Task Runs, so you won’t overload a central scheduler 👍Daniel Rothenberg
12/03/2020, 3:55 PMDylan
Pedro Machado
12/04/2020, 12:50 AM