https://prefect.io logo
Title
d

Deepak

09/28/2022, 11:08 PM
Has anyone here tried running pyspark inside a prefect task? I get
RuntimeError: Java gateway process exited before sending its port number
when I run pyspark methods inside a task.
m

Matt Conger

09/28/2022, 11:22 PM
Hey Deepak, Pyspark should be good to go in Prefect tasks. This error could be caused by not having a pyspark-shell to the shell environment. https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po
d

Deepak

09/29/2022, 4:19 PM
Hi Matt The first pyspark call in my code was
df_spark = ps.from_pandas(df_pandas)
, which is where it is failing.I tried two things First was to execute
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
before running prefect register. Second was to add this snippet
from pyspark import SparkContext 
sc = SparkContext.getOrCreate() 
df_pandas = sc.parallelize(df_pandas)
before
df_spark = ps.from_pandas(df_pandas)
but neither of these worked. I got the same error, i.e.
Java gateway process exited before sending its port number
.
m

Matt Conger

09/30/2022, 10:13 PM
Deepak, does this section of code work without the Prefect decorators?