Sharath Chandra
04/14/2022, 7:06 AMspark-submit
using prefect’s ShellTask
.
I have created a subclass of ShellTask to invoke the spark-submit
.The spark jobs are running on k8s.
There seems to be an issue especially with long running tasks where the spark jobs completes but the prefect task in unaware of the status and continues keep running.
I have tried few suggestions including
Added to the following configuration to job template
“<http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>”: “false”
It looks like ShellTask
has some issue in maintaining the status
To overcome this I am thinking the following approach:
- In map of jobs say ["j1", "j2", "j3"]
, submit all the jobs and not wait for execution completion (In my case since there is no dependency between tasks)
- Separate task to monitor the status for all the tasks. This task runs in a loop say for max 1 hour before timeout.
This task looks at status for each task and updates the overall status
What is the best way to accomplish this looped task on prefect ?Anna Geller
Anna Geller
ShellTask
(I think).
Maybe someone from the community can chime in but I wouldn't go that route unless you really have toSharath Chandra
04/14/2022, 9:57 AMAnna Geller
# Running Spark application on Kubernetes cluster
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master <k8s://192.168.231.132:443> \
--deploy-mode cluster \
--executor-memory 5G \
--executor-cores 8 \
/spark-home/examples/jars/spark-examples_versionxx.jar 80
there is an option to submit via REST actually, but on K8s the above is preferredAnna Geller
Sharath Chandra
04/14/2022, 10:00 AMBring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by