Still coming to Databricks and I guess in general ...
# prefect-community
e
Still coming to Databricks and I guess in general with Tasks. Databricks has added git support for jobs, and the current DatabricksSubmitMultitaskRun doesn't support it. I am in doubt among the possible approaches: • Create a DatabricksSubmitMultitaskRun custom implementation. However, I will access underscore methods in prefect package • Use the databricks CLI python library to create a job and then just run it via Prefect • Other? The real problem is that the Task does not allow dependency injection (i.e. the databricks client is created within the Run function, so it's not easy to override it). I guess the design of the Task is concerning in the sense that is not extensible, one needs to rewrite it from scratch
k
It’s end of day here, but I’ll let the integrations team know about this post
e
Thanks, a little bit of context. Databricks has added a feature where jobs can be executed using the source from a GIT Repo rather than checking out code. This makes a lot of things much simpler, because before you needed to check out code on Databricks before running jobs. We looked at Prefect and AirFlow as a way to orchestrate integration tests in Databricks, because we felt that using Terraform to create those jobs is cumbersome, and you still need scripts to execute them. However: • There is no task to create a job that is visible via UI. I will still need to use databricks cli or Terraform for it, if I want my integration tests to be visible in the job history UI • The tasks are coded in a way that extending/reusing the functionality is hard.
As I mentioned, one obvious way would be to intercept the payload and enrich it, but this is not possible because the Databricks Hook is hardwired instead of being injected(https://github.com/PrefectHQ/prefect/blob/master/src/prefect/tasks/databricks/databricks_submitjob.py#L1060) The payload is hardcoded, no way to intervene here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/tasks/databricks/databricks_submitjob.py#L1067 and if I rewrite my own Task, I will violate Python visibility rules by using this function https://github.com/PrefectHQ/prefect/blob/087dfb04ca6be3cd1a444cd212b54987d89cd913/src/prefect/tasks/databricks/databricks_submitjob.py#L54
a
Hey @Edmondo Porcu, do you have a link to the documentation for the new Databricks feature that you mentioned?
e
I have already the bugfix. I can submit that, as I did on Terraform Databricks Provider. However the real question is that the task design should incorporate some flexibility
I have cloned prefect and will submit a PR myself
There is no Makefile ...
a
That’s great that you already have a fix! For Prefect 2.0, we are working on new integrations with Databricks that will be more flexible and easier to use.
e
When will Prefect 2.0 be out? Why am I even using Prefect 1.0?
k
2.0 is in beta and the tasks are being ported over by Alex and team
You can see it here
e
When do you plan it to have it GA?
@alex is it worth enhancing the old task?
k
Anyone can already use it at
<http://beta.prefect.io|beta.prefect.io>
. Off beta estimated to be next quarter
e
what about the python code? is it available as a separate library?
k
Prefect 1 is Prefect 1.2. You can install
pip install prefect==2.0b3
e
Awesome, will the api be compatible?
k
No. 2.0 is breaking because it was designed from scratch to solve a lot of things 1.0 could not do
e
I see. Is there a blog post about it?
k
Check this.