Alexandru Anghel
09/20/2022, 3:21 PMTypeError: cannot pickle '_thread.RLock' object
I can see the application being registered inside the spark cluster.
Thanks!Unexpected error: TypeError("cannot pickle '_thread.RLock' object") Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner new_state = method(self, state, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 930, in get_task_run_state result = self.result.write(value, **formatting_kwargs) File "/usr/local/lib/python3.8/site-packages/prefect/engine/results/gcs_result.py", line 75, in write binary_data = new.serializer.serialize(new.value) File "/usr/local/lib/python3.8/site-packages/prefect/engine/serializers.py", line 73, in serialize return cloudpickle.dumps(value) File "/usr/local/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps cp.dump(obj) File "/usr/local/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 633, in dump return Pickler.dump(self, obj) TypeError: cannot pickle '_thread.RLock' object
Kyle McChesney
09/20/2022, 3:25 PMAlexandru Anghel
09/20/2022, 3:25 PM@task(name='Get data', log_stdout=True, max_retries=2, retry_delay=timedelta(seconds=10), state_handlers=[post_to_slack_on_failure])
def get_data():
spark = SparkSession.builder.appName("Bobita-app").master("<spark://spark-master:7077>").getOrCreate()
return spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
with Flow(
name=flow_name,
schedule=CronSchedule(cron=cron),
storage=GCS(bucket="prefect-bucket"),
run_config=KubernetesRun(
labels=["dev"],
job_template=register_flow_job_template,
service_account_name="prefect-sa",
# env={"PYSPARK_SUBMIT_ARGS": "--master <spark://spark-master-alerts:7077> --deploy-mode cluster pyspark-shell"},
image="prefect-base:v0.1",
image_pull_policy="Always",
)
) as flow:
df = get_data()
22/09/20 15:16:25 INFO Master: Registered app Bobita-app with ID app-20220920151625-0000
22/09/20 15:16:25 INFO Master: Launching executor app-20220920151625-0000/0 on worker worker-20220920151514-10.42.163.29-36489
22/09/20 15:16:25 INFO Master: Launching executor app-20220920151625-0000/1 on worker worker-20220920151515-10.42.87.199-43455
Kyle McChesney
09/20/2022, 3:31 PM.toPandas()
on the end of the returnAlexandru Anghel
09/20/2022, 3:40 PMCaused by: java.io.IOException: Failed to connect to prefect-job-b3fe1959-274t6:42699
They are trying to connect to the Kubernetes job that started the flow. Maybe to return the state of the task? I am running the entire infra behind a corporate proxy and trying to figure out if this is causing the error.