Hi Prefect experts, I want to try out Prefect in our existing ETL pipeline for scheduling/Spark job management - I know Prefect is a best match for Python based scripts ( eg: Pyspark) , would it support Spark Scala/Java jobs as well ? because our ETL is mainly built with Scala Spark jobs? any examples or document related to this matter? Thank you in advance! 🙏 ( sorry, if this is a duplicate question)
j
Jim Crist-Harif
12/22/2020, 4:08 PM
Hi Ajith, Prefect can run jobs of any type. For launching things other than Python tasks, you'd need some way to kick off a spark job from Python or using a shell script (and our
ShellTask
). This might call
spark-submit
or something else.
a
Ajith Kumara Beragala Acharige Lal
12/22/2020, 4:13 PM
Thank you @Jim Crist-Harif for the quick response! If I got you correct - I need some kind of wrapper which can trigger Scala spark jobs based on Prefect calls? Is my understanding correct?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.