Are there any plans to add another type of executor for dist Prefect Community #prefect-server

Are there any plans to add another type of executo...

06/04/2021, 1:43 PM

Are there any plans to add another type of executor for distributed environments? Our use case requires a large batch of tasks. Each task just makes an invocation of a C++ executable in an embarrassingly parallel fashion. i.e. If done serially by hand:

Copy code

# task 1
./my_cpp_executable <http://file1.xyz|file1.xyz>

...

# task 1000
./my_cpp_executable <http://file1000.xyz|file1000.xyz>

Each task takes about 4-8 compute hours on 4 CPUs/32G~ memory and our scheduled workloads take up about 20,000-40,000+ compute hours per day. From what I can tell the only supported strategy for running a large batch of embarrassingly parallel tasks right now is to use Dask. We have it working but I feel Dask is more oriented to (i) interactive analysis workloads, (ii) pure Python tasks, (iii) small jobs that fit onto local disk for each Dask node. Feels awkward to invoke a Dask executor for a one-line shell execution for a high-throughput, long-running, queued (num_tasks >> num_cluster_nodes) workload. We prefer not to have to support Dask on our infrastructure as it adds a whole other set of things that our sysengs have to maintain. Seems more suitable if you supported any job queueing systems typically found in HPC environments like SGE, Slurm or HTCondor. I figure many of your target users in the fintech, scientific computing, meteorological space will already have SGE or Mesos cluster set up in their environment, but not a Dask cluster.

Spencer

06/04/2021, 1:49 PM

I offload these sorts of tasks to AWS Batch and simply use Prefect as the orchestrator for kicking those jobs off. 🤷

06/04/2021, 1:51 PM

Do you use AWS Direct Connect in conjunction with AWS Batch?

06/04/2021, 1:52 PM

We have a sizable amount of data (>10 PB) on-prem, would be cost-prohibitive to access it in AWS and we get much better I/O on our own infrastructure.

Spencer

06/04/2021, 1:52 PM

I don't have any on-prem stuff 😅

06/04/2021, 1:52 PM

Ah ok

Spencer

06/04/2021, 1:53 PM

If you had any batch job system on prem, you could offload to that 🤷 I find it gets a bit messy if you have Prefect do everything when it can just orchestrate other systems to do the heavy lifting.

06/04/2021, 1:55 PM

Yeah agreed, I already have HTCondor set up on our cluster, but it takes away some of the centralized visibility/monitoring to offload it like that

06/04/2021, 1:55 PM

It feels easier to orchestrate on a build system like Jenkins in that case

06/04/2021, 1:56 PM

Much simpler to maintain

Spencer

06/04/2021, 1:57 PM

Well if you already have Jenkins and it drives a bunch of workflows, that should do the trick

👍 2

06/04/2021, 2:00 PM

mhm thanks

4 Views

Open in Slack

Previous Next