Hi, I am just wondering what are all the possible ...
# prefect-community
t
Hi, I am just wondering what are all the possible options prefect provides us for executing spark/pyspark jobs. Like, in kubeflow pipelines we can use spark-operator to do this. In our use case we want to do everything on K8 clusters so using prefect how can we do that. Using this example, we can only run spark in local mode. We want to run spark/pyspark jobs on our K8 cluster with proper scaling and all. cc @Kevin Kho @Anna Geller
cc @mukul dev
👍 1
a
Hi Tejal, it depends a lot on where your Spark cluster is running. Prefect is an official Databricks partner, so if you want to leverage Spark on Databricks, Prefect can help you orchestrate those workflows Also, we integrate with Fugue and AWS has an extremely easy way to run EMR jobs with awswrangler Running a Spark job on a Kubernetes cluster is more difficult as you have to submit jobs to a cluster yourself (even with Kubeflow). But it's possible - you need to run
spark_submit
command to submit jobs to the cluster and poll for status
🎉 1
t
Instead of running spark-submit command like this:
Copy code
# Run on a Kubernetes cluster in cluster deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master <k8s://xx.yy.zz.ww:443> \
  --deploy-mode cluster \
  --executor-memory 20G \
  --num-executors 50 \
  <http://path/to/examples.jar> \
  1000
can we also connect to a K8 spark cluster like this using python code only and run execution:
Copy code
conf = SparkConf().setAppName(appName).setMaster(<k8s://xx.yy.zz.ww:443>)
sc = SparkContext(conf=conf)
a
I don't know enough about the Python code you shared to help you here but as long as you can reach Spark master node from this flow, you should be able to submit jobs to it whether via shell command or Python script