Hi all. We are looking into whether Prefect meets ...
# ask-community
l
Hi all. We are looking into whether Prefect meets our scheduling needs. We mainly want to use it to call spark-submits. We want this to be called in parallel, which is possible in Prefect with Dask for example. Suggest we do a spark submit and it takes 15 minutes. These 15 minutes therefore take place outside Prefect. In this case, will Prefect wait for this task to be completed or does it continue with other tasks? In other words, will this task use the thread for 15 minutes.
a
Welcome to the community @Laurens! You can definitely call Spark submit jobs from Prefect, and there are some tasks in the Task Library that can help you with that - this PR contains an example in which Databricks Spark Submit jobs are submitted in parallel using Mapping and LocalDaskExecutor. Whether you want to wait for the results of Spark job in your Prefect flow is certainly configurable. Overall, Prefect doesn’t require you to use any predefined tasks from our library - you can build your own logic so that Prefect waits for those Spark jobs to continue, or not - it’s up to you to design it as you wish. Taking the DatabricksSubmitRun as example, it contains
polling_period_seconds
to poll for the results of this spark job.
l
I'm gonna look into this, sounds promising! Thanks for the fast reply @Anna Geller 🙂
👍 1
k
Do you use
spark-submit
with the command line? You can also use the ShellTask