Enda Peng
04/30/2021, 7:45 PM@task
process_file(x):
with Flow(xxx):
[process_file(x) for x in file_names]
It works fine with LocalRun
◦ pro: Easy to set up, dask has the friendly api to control resource
◦ con: I have to set up the same running env for every worker added in this dask cluster.
• Option 2: Multiple flows + multiple agents: 1-on-1 mapping between flow and file
e.g I could create 10 docker agents running on 10 hosts. Then in a script I create and run 100 flows, each flow processes one file. Let prefect distribute the flow for me
◦ pro: computation module is shipped with docker image, it is set up free.
◦ con: Not sure whether prefect is supposed to do the work load distribution duty. Even if yes, it is hard to control the resource consumption
• Option 3: Single flow + K8s:
Build image for my computation module and register with K8S first. Within one flow, it create k8s tasks which request for creating 100 pods to process the file.
◦ pro:
▪︎ k8s could deal with the workload distribution. Adding nodes could be easy.
▪︎ Any agent would be file as long as it could talk to k8s api
◦ con: complexity of setting up k8s?
Appreciate for any thoughts and input here!Kevin Kho
.map()
to parallelize across a DaskExecutor. Getting the Dask cluster is a separate issue but my personally experience has been that it’s very easy to get one up and running with Coiled (and they’re in free trial) at the moment.Kevin Kho
DaskExecutor
or connect by passing the address.Jimmy Le
04/30/2021, 8:12 PM@task
def get_file_list():
...
return file_list
with Flow(...):
file_list = get_file_list()
processed = process_file.map(file_list)
update_db(processed)
Jimmy Le
04/30/2021, 8:14 PMLocalDaskExecutor
for local testing and spinning up Fargate Tasks with the DaskExecutor
for production.Enda Peng
04/30/2021, 8:19 PM