hey everyone - weird question but does anyone have...
# prefect-community
j
hey everyone - weird question but does anyone have experience using Apache Spark. I’m investigating for a project best practices to run multiple parameter models on a single data set within a Spark cluster orchestrated with prefect. My main question is where to place the multiplier; would I get better performance to submit multiple job using
.map()
or submit a single job and manage running the different models within the single spark job.
n
Hi @John Ramirez! I don't have experience in Spark but it would make sense to do the
.map()
in Prefect if you want to take advantage of the Prefect semantics (like state handlers, retries, conditional branches in the map etc). Depending on how much overhead there is to starting/stopping the Spark job, that'll probably be a big decision point to whether you want to manage it within a single job or not. Hopefully that's at least somewhat helpful