Does anyone have tried run a pyspark ( spark clust...
# ask-community
q
Does anyone have tried run a pyspark ( spark cluster in local docker ) task from a local agent in Docker without using bash command ?
a
Only in AirFlow
but it should be no different in Prefect
That's the gist of the Spark hook I'm using
Copy code
spark_conf = SparkConf()
spark_conf.setAll(self.conf.items())
spark_session = SparkSession.builder.config(conf=spark_conf).getOrCreate()
the prerequisite is to have a matching
pyspark
package installed in the PYTHONPATH used by the agent
馃檶 1
q
i agree with you and that's what i want to try. But need to use spark-submit to run pyspark script, no (prefect flow would be run by python or python3 command)
a
so you want to submit a python script without using cli?
q
I have tried add function findspark in my code and it works
a
Could you share the details!?
q
I created a flow ( ex : flow.py) prefect who has some pyspark task. Creat all these elements as usual. Run this flow with command python flow.py, first time it donesn't work because PATH PYSPARK was not configured. So I just add function findspark in order to initialize PATH PYSPARK and it works.
馃憤 1
I think if you configured your pyspark path in env variable, it should work also
馃檶 1