https://prefect.io logo
#prefect-community
Title
# prefect-community
e

Edmondo Porcu

05/05/2022, 1:20 AM
Still coming to Databricks and I guess in general with Tasks. Databricks has added git support for jobs, and the current DatabricksSubmitMultitaskRun doesn't support it. I am in doubt among the possible approaches: • Create a DatabricksSubmitMultitaskRun custom implementation. However, I will access underscore methods in prefect package • Use the databricks CLI python library to create a job and then just run it via Prefect • Other? The real problem is that the Task does not allow dependency injection (i.e. the databricks client is created within the Run function, so it's not easy to override it). I guess the design of the Task is concerning in the sense that is not extensible, one needs to rewrite it from scratch
k

Kevin Kho

05/05/2022, 1:21 AM
It’s end of day here, but I’ll let the integrations team know about this post
e

Edmondo Porcu

05/05/2022, 1:27 AM
Thanks, a little bit of context. Databricks has added a feature where jobs can be executed using the source from a GIT Repo rather than checking out code. This makes a lot of things much simpler, because before you needed to check out code on Databricks before running jobs. We looked at Prefect and AirFlow as a way to orchestrate integration tests in Databricks, because we felt that using Terraform to create those jobs is cumbersome, and you still need scripts to execute them. However: • There is no task to create a job that is visible via UI. I will still need to use databricks cli or Terraform for it, if I want my integration tests to be visible in the job history UI • The tasks are coded in a way that extending/reusing the functionality is hard.
As I mentioned, one obvious way would be to intercept the payload and enrich it, but this is not possible because the Databricks Hook is hardwired instead of being injected(https://github.com/PrefectHQ/prefect/blob/master/src/prefect/tasks/databricks/databricks_submitjob.py#L1060) The payload is hardcoded, no way to intervene here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/tasks/databricks/databricks_submitjob.py#L1067 and if I rewrite my own Task, I will violate Python visibility rules by using this function https://github.com/PrefectHQ/prefect/blob/087dfb04ca6be3cd1a444cd212b54987d89cd913/src/prefect/tasks/databricks/databricks_submitjob.py#L54
a

alex

05/05/2022, 6:53 PM
Hey @Edmondo Porcu, do you have a link to the documentation for the new Databricks feature that you mentioned?
e

Edmondo Porcu

05/05/2022, 6:54 PM
I have already the bugfix. I can submit that, as I did on Terraform Databricks Provider. However the real question is that the task design should incorporate some flexibility
I have cloned prefect and will submit a PR myself
There is no Makefile ...
a

alex

05/05/2022, 7:01 PM
That’s great that you already have a fix! For Prefect 2.0, we are working on new integrations with Databricks that will be more flexible and easier to use.
e

Edmondo Porcu

05/05/2022, 7:16 PM
When will Prefect 2.0 be out? Why am I even using Prefect 1.0?
k

Kevin Kho

05/05/2022, 7:17 PM
2.0 is in beta and the tasks are being ported over by Alex and team
You can see it here
e

Edmondo Porcu

05/05/2022, 9:53 PM
When do you plan it to have it GA?
@alex is it worth enhancing the old task?
k

Kevin Kho

05/05/2022, 10:20 PM
Anyone can already use it at
<http://beta.prefect.io|beta.prefect.io>
. Off beta estimated to be next quarter
e

Edmondo Porcu

05/05/2022, 11:02 PM
what about the python code? is it available as a separate library?
k

Kevin Kho

05/06/2022, 2:25 AM
Prefect 1 is Prefect 1.2. You can install
pip install prefect==2.0b3
e

Edmondo Porcu

05/06/2022, 2:37 AM
Awesome, will the api be compatible?
k

Kevin Kho

05/06/2022, 2:51 AM
No. 2.0 is breaking because it was designed from scratch to solve a lot of things 1.0 could not do
e

Edmondo Porcu

05/06/2022, 3:21 AM
I see. Is there a blog post about it?
k

Kevin Kho

05/06/2022, 3:27 AM
Check this.
14 Views