hey community :wave: :slightly_smiling_face: , Pr...
# ask-community
m
hey community 👋 🙂 , Prefect is one of the option to adopt inside the company, but we would like to resolve our use case. A simple batch process like running a job in a EMR cluster, which would be the approach using P? Airflow has some good adapters, but I am curious if we could do the same using prefect
k
Hey @Marc, we currently don’t have an EMR task specifically, but running an EMR job should be particularly hard to wrap as a task in Prefect.
m
Which could be the approach using Prefect? @Kevin Kho I guest that running spark jobs in EMR is a common use case as alternative to use databricks (as example). Which approach do you consider inside Prefect to run Spark jobs?
k
Oh sorry typo there. Meant to say NOT particularly hard 😅. I see Databricks more as a common use case and we do have a task for that in the task library. EMR should be fine!
m
when you say
wrap as a task
Do you mean implementing an interface for the
prefect.tasks
k
In prefect you can wrap a Python function as a
prefect.task
with
Copy code
@task
def abc(x):
    return x + 1
So it would just be a matter of putting the EMR API call inside the function
m
okss! I got your point, you suggest to use boto3 like aws client, and handle the respond inside the , that's it. I went far away. I thought implementing a new class for EMR resource https://github.com/PrefectHQ/prefect/blob/5221e6c4ef68eb8c7659c212ea937d1856cee8a4/src/prefect/tasks/aws/s3.py#L174
k
Yep exactly! The only caveat is that you likely need to create the client inside the task because task inputs and outputs need to be serializable to support parallel execution. If single-threaded execution though (the default), it should be fine passing a client around. Hey, if you end up implementing it, we’d gladly take the PR 😆. But I don’t think you need to implement the class interface.
m
Well... in this case, I guess single-threaded is fine. You only need to trigger the job to EMR that's all, as you do with Databricks
k
Yep!