Hi all Kind of a basic question but I don t seem to find rel Prefect Community #ask-community

Hi all! Kind of a basic question, but I don't seem...

Arsenii

03/10/2020, 1:53 AM

Hi all! Kind of a basic question, but I don't seem to find relevant documentation on this. I understand that

DaskExecutor

can be used for parallelization inside the flow between tasks, but what about parallelization between flows themselves? I see that there's

DaskKubernetes

environment that spawns pods for flows, each with a temporary Dask cluster inside, which makes sense to me on the surface but Kubernetes is not currently an option for us. Would setting up something like `FargateEnvironment`/`Fargate Agent` bring significant improvements compared to, say, regular

DockerAgent

? If a flow is run as a

Fargate Task

with a specified remote

DaskExecutor

, where does it actually ""run"" the flow? Does it make more sense to have a dedicated remote Dask cluster somewhere, or start up a local one for each flow? Thanks again for all the help!

Chris White

03/10/2020, 6:11 AM

Hi @Arsenii - assuming you are leveraging Prefect Cloud, all Prefect Agents are capable of submitting arbitrarily many flows to run in parallel (assuming your infrastructure has the appropriate resources). Fargate Agents submit flows to run as Fargate Tasks, Docker agents submit flows to run using a docker daemon, K8s Agent submits flows to run as k8s jobs, etc. Each Agent type has a different notion of “where” a Flow runs, and which you should use is entirely use-case dependent

Arsenii

03/10/2020, 6:36 AM

Thanks @Chris White! Yes, I understand that the final structure depends on the use-case, and I wondered if the end performance would differ between a couple of those. i.e., would a Fargate Agent, spawning Fargate Tasks for each flow, "outperform" a Docker Agent running on a single Fargate Task (with comparable computational power)? The second confusion point for me was about DaskExecutor&Agents combination. i.e. is having a Fargate Agent, which spawns Fargate Tasks for each flow, where each flow is parallelized via a Dask cluster (connected through a Gateway to AWS EMR, for example), a popular use case in general or the ROI is too low for non-super-heavy-workflows? Thanks again

👍 1

Braun Reyes

03/10/2020, 1:47 PM

Here is how we use the Fargate Agent and Dask Executor at Clearcover, Inc. • We have the Fargate Agent submit flows, which run as Fargate tasks. • We use a default configuration of 1 worker and 2 threads per worker, since the default fargate instance size is 256 CPU and 512 MEM. We have then worked into our CICD process for flows to request their own configurations based on the max sizes provided by Fargate. For example, you can request a Fargate Task with 4 vCPU, which could handle up to 4 workers for CPU bound workloads. • This is all for single machine dask clusters • I feel there is room for Fargate functionality to be added where multiple additional Fargate task could created and then joined to the main fargate task to form a cluster like how dask-kubernetes does it....just needs to be built 🙂

upvote 2

Chris White

03/10/2020, 4:52 PM

In general I wouldn’t expect there to be noticeable performance differences across the various platforms; I think ultimately the choice comes down to: - what is your preferred tech stack - which platform is easiest for you personally to configure Whether you need to provision a full Dask Cluster vs. using the single machine Dask Executor (as Braun is using) depends on how computationally intensive your tasks are

🙂 1

Arsenii

03/13/2020, 6:33 AM

A wonderful practical example, thanks @Braun Reyes! This tells me a lot about potential deployment recipes. Now I also see how that ties back to providing Fargate Task config though the Context option in Cloud. This approach does add a level of complexity though, and as @Chris White helpfully noticed, the main benefit would be easier deployment on a platform we're personally comfortable with -- I'll have to discuss this with our engineers first then Thanks!! P.S. IMO the documentation does a great job on explaining how to configure different deployment recipes but I failed to find hints as to why you would prefer some to others (except for example in

Executors

, the drawbacks of

LocalExecutor

compared to others was very clear)... Hence this Slack discussion It might be objectively obvious but I felt something could be added :)

👍 1

2 Views

Open in Slack

Previous Next