Hafsa Junaid

04/26/2022, 6:03 AM
Hey.. I am registering my task on prefect-server to demo project. The task has spark and sparks context. The is successful but flow.register() gives following error RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063. Any guide on this spark context setting.

Rhys Mansal

04/26/2022, 9:08 AM
Can I ask why you are trying to run Spark code on prefect? It is more normal to use prefect to submit Spark jobs to a cluster. In any case, this is an issue with your Spark code, not Prefect. As the error says, you are referencing the SparkContext inappropriately.
:upvote: 1
💯 1

Anna Geller

04/26/2022, 10:53 AM
Hafsa, are you committed to Spark? Prefect works very well with Dask (and also with Ray in 2.0), allowing you to dynamically spin up tasks and submit those to Dask and providing you with visibility into their state. With Spark, it's more of a black box - you submit a Spark job to a cluster from a Prefect flow, and you only know the exit code of this large job (success or failure).