Hey, we've recently started exploring Prefect and we'd like to schedule Spark jobs on Cloud Dataproc - I've noticed the excellent integration with Databricks but didn't see anything re. running jobs on other Spark platforms/distributions. Am I missing something? Has anyone implemented this already? Thanks a lot for your help!
k
Kevin Kho
04/08/2021, 4:34 PM
Hi @Remi Paulin, no we don’t have other Tasks in our library for Spark. You can add a feature request on Github so other people can voice their support of this feature. Databricks is already available on GCP (though I understand if you don’t want to use it in favor of Dataproc)
r
Remi Paulin
04/08/2021, 4:38 PM
Hi @Kevin Kho thanks a lot for the quick reply! Ok, got it, we might actually use Databricks on GCP, especially if integration with Prefect is more straightforward. Thanks again!
k
Kevin Kho
04/08/2021, 4:43 PM
We’d gladly accept the Cloud Dataproc task integration if ever you guys do build one though!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.