Hello guys, I'm very happy to be heard. I have a s...
# prefect-server
a
Hello guys, I'm very happy to be heard. I have a simple question: does Prefect run on Spark Cluster? Because I didn't find that in a clear explanation and all the documentation is oriented Dask oriented. Can someone explain this to me? Thanks
n
Hi @Ana - since you posted this in the #prefect-server channel I'll assume you're talking about deploying the Server application to a Spark Cluster, is that correct?
👍 1
a
Yes, totally!
n
Got it! Ok so this is possible but it's not something we explicitly support. I think the most analogous deployment I've seen from the community has been the helm chart developed by @Shaun Cutts here: https://github.com/PrefectHQ/server/pull/57
👍 1
a
Thanks for the answer. So, to confirm: Prefect is more Dask oriented and doesn't support Spark cluster, right? 🙂
n
@Ana I think your use case is different than I was imagining; can you help me understand what you’re trying to accomplish?
a
OK, I'll clarify my question: Can I use Prefect to orchestrate Spark jobs on the Spark cluster?
n
Yes! You definitely can do that; while we don’t have a SparkExecutor or anything, it’s definitely possible to write a flow that spins up a spark cluster, submits jobs to it, and then tears it down.
👍 1
a
Great!! Can you share any documentation on this or a user case? I didn't find any documentation on this except on using the databricks API
Hello @nicholas ! Do you have any information about my last question?
n
Hi @Ana, apologies for the delay as I was out of the office on Friday. I don't think there's any strict documentation on this but I know it's been discussed before in the community channel; because of the modular nature of Prefect tasks, anything you could do with a normal Python script you can do with a Prefect task. This includes things like interacting with an API or OS
👍 1
j
does Prefect run on Spark Cluster
Hi @Ana, to clarify are you asking if you can run Prefect on top of Spark clusters, or are you asking if you can run Prefect on a Hadoop/YARN cluster that you normally use for running spark jobs (some people conflate the two). If the latter, the answer is definitely yes (Dask can run fine on a Hadoop/YARN cluster using https://yarn.dask.org). We could improve our integration here further by writing an Agent (https://docs.prefect.io/orchestration/agents/overview.html) for running Prefect flow runs on the cluster as well, but that work hasn't been done yet.
👍 1
a
Hi @Jim Crist-Harif, thank you for the answer. My question is to run Prefect Orchestration on Spark cluster (not hadoop/yarn). I can't see any documentation on this and this make me feel uncomfortable in using Prefect.
Hi @nicholas, thanks for the answer. I know I can write Python script in Prefect, but I need to see a user case or any documentation to answer this question. Can you provide that?