Has anyone here tried running pyspark inside a pre...
# ask-community
d
Has anyone here tried running pyspark inside a prefect task? I get
RuntimeError: Java gateway process exited before sending its port number
when I run pyspark methods inside a task.
m
Hey Deepak, Pyspark should be good to go in Prefect tasks. This error could be caused by not having a pyspark-shell to the shell environment. https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po
d
Hi Matt The first pyspark call in my code was
df_spark = ps.from_pandas(df_pandas)
, which is where it is failing.I tried two things First was to execute
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
before running prefect register. Second was to add this snippet
Copy code
from pyspark import SparkContext 
sc = SparkContext.getOrCreate() 
df_pandas = sc.parallelize(df_pandas)
before
df_spark = ps.from_pandas(df_pandas)
but neither of these worked. I got the same error, i.e.
Java gateway process exited before sending its port number
.
m
Deepak, does this section of code work without the Prefect decorators?