Is there any plan to create a task library task for running Prefect Community #ask-community

Join Slack

Is there any plan to create a task-library task fo...

# ask-community

Ryan Sattler

08/27/2021, 1:40 AM

Is there any plan to create a task-library task for running Spark on kubernetes (as opposed to Databricks)?

nicholas

08/27/2021, 1:49 AM

Hi @Ryan Sattler - no plans for that internally at the moment but much of the task library is user-contributed... if that's something you'd like to see we welcome PRs 😄

Gaylord Cherencey

08/27/2021, 2:26 AM

I was thinking about this as well and I wonder if we could create the class to initiate the SparkContext inside the flow (we will need to have a docker image with Spark installed) and point the SparkConfig to the Kubernetes Spark cluster.

Kevin Kho

08/27/2021, 3:02 AM

I have used Databricks a lot before but don’t have as much experience with Kubrenetes Spark. Would the task run

spark-submit

and that’s how you would connect to the Kubernetes Spark cluster? If your Spark is already configured, wouldn’t just instantiating SparkSession inside a flow work? And to what Nicholas said, we’d surely welcome PRs for this.

Gaylord Cherencey

08/27/2021, 3:46 AM

We could use a BashTask to run the

spark-submit

command but I would say it might be possible to instantiate a SparkSesssion inside the Flow (will have to give it a try). I did not look but is it how the Databricks one is working (session)?

Kevin Kho

08/27/2021, 4:05 AM

Databricks has a

databricks-connect

library that hijacks your Spark installation so

import pyspark

and creating the

SparkSession

compiles the DAG locally, then sends them to the configured cluster when there is an action.

Gaylord Cherencey

08/27/2021, 4:10 AM

Hummm I guess it would be possible to instantiate SparkSession inside Flow if we run it inside an image with

pyspark

3 Views

Open in Slack

Previous Next