John Ramirez
05/06/2020, 1:12 PMDask Cloud Provider Environment
and want to know if this env would be the best way to accomplish this goalnicholas
05/06/2020, 1:28 PMJim Crist-Harif
05/06/2020, 1:37 PMJohn Ramirez
05/06/2020, 1:47 PMJoe Schmid
05/06/2020, 1:48 PMDaskCloudProviderEnvironment
and am happy to answer questions about it, but maybe give us a quick update on which approach that Jim laid out might be a better fit for you and we could provide more thoughts.map()
) within your Flow?John Ramirez
05/06/2020, 1:56 PMmap()
extensively because the flow consumes a parameter file with 9-10 distinct combinations to evaluate.Jim Crist-Harif
05/06/2020, 1:56 PMDaskKubernetesEnvironment
would be my recommendation.DaskCloudProviderEnvironment
won't get you anything, it's just a way to deploy dask on different infrastructure.David Ojeda
05/06/2020, 1:58 PMJohn Ramirez
05/06/2020, 2:03 PMDaskCloudProviderEnvironment
to run these ad-hoc runs on demand.Jim Crist-Harif
05/06/2020, 2:07 PMDaskCloudProviderEnvironment
would be one way to manage this then, since you'd be deploying those flows outside the EKS cluster. Another way would be to configure a scalable node pool in EKS, and continue using DaskKubernetesEnvironment
- when under load (say kicking off 3000 jobs) EKS will scale up, but will scale back down when not needed.
Either option would work. I think keeping everything in k8s is slightly simpler, but that's up to you.John Ramirez
05/06/2020, 3:15 PMRemoteEnvironment
NOT the DaskKubernetesEnvironment
. It was never stable enough and would constantly finalJoe Schmid
05/06/2020, 3:33 PMDaskCloudProviderEnvironment
. If you wanted to pursue that route, I'd recommend starting by using the Dask Cloud Provider project directly, make sure you can create and access a Dask cluster with it successfully, and then try out DaskCloudProviderEnvironment
.DaskCloudProviderEnvironment
does come with significant overhead in Flow run startup time due to Fargate allocating resources and pulling docker images. We currently see about 5 minutes of latency at the start of Flow runs because these 3 things happen sequentially:
1. Fargate Agent launches a Fargate task for the flow run. (We can reduce this a bit if we use EC2 launch type instead of Fargate and have an ec2 instance registered with ECS, but then you're losing the serverless aspect of Fargate.)
2. Flow run calls execute() in DaskCloudProviderEnvironment
which creates a Fargate task for the Dask scheduler. It has to wait for the scheduler to be created in order to get the scheduler address to start the workers with.
3. Finally, start Fargate task(s) for the Dask worker(s)
If you're not sensitive to this startup latency, then this is a very nice approach that is easy to configure, but it does come with this overhead.John Ramirez
05/06/2020, 4:01 PMJim Crist-Harif
05/06/2020, 4:03 PM