I am trying to build a dag that that runs a pypark...
# prefect-community
a
I am trying to build a dag that that runs a pypark job. Does anyone have any experience doing anything like this? Ideally I could have it set up like in the code here https://docs.prefect.io/guide/getting_started/next-steps.html but im having trouble understanding how it is run. Would the create_cluster function create spark context, and if it does how it is then submitted?
a
That's just a high level example of what a flow could potentially look like. Im pretty sure the create_cluster in that case is exemplifying creating a full blown cluster. If you were in AWS you could create a Spark cluster by using AWS EMR, from the code you could use boto3 to create the cluster, and you could send spark jobs to it via EMR Steps or by using Apache Livy for example.
a
Thanks for the reply! My issue ended up being and environment problem, but I got that figured out and got it running locally
👍 1