https://prefect.io logo
Title
r

Renuka

05/26/2022, 6:53 PM
Is there a way to trigger databricks notebook from prefect 2.0 in the beta version ?
k

Kevin Kho

05/26/2022, 6:58 PM
Not yet. It’s still being actively worked on.
r

Renuka

05/26/2022, 7:00 PM
Will it be similar if I test it in version 1 ?
k

Kevin Kho

05/26/2022, 7:08 PM
Yeah but the plan is to make it better. You could copy the Python code under the hood for the one in Prefect 1.0
r

Renuka

05/26/2022, 9:22 PM
Can you point me to that code in the docs please ?
k

Kevin Kho

05/26/2022, 9:36 PM
Click the source button here
r

Renuka

05/27/2022, 7:23 PM
I was able to run the databricks notebook from prefect but when I overwrite tables within the notebook I am having this error
When I run the notebook without prefect I don't have any issue
k

Kevin Kho

05/27/2022, 7:26 PM
Looks to me like you are not authenticated to run the notebook?
r

Renuka

05/27/2022, 7:27 PM
The notebook is running fine successfully but at this step throws an error while saving to table
k

Kevin Kho

05/27/2022, 7:30 PM
How do you run without Prefect? With an API call or running the notebook with Databricks UI?
r

Renuka

05/27/2022, 7:32 PM
I run using the databricks UI without any issue. When run from prefect as well it runs fine except for the step of table overwrite in the notebook
k

Kevin Kho

05/27/2022, 7:33 PM
What about with an API call without Prefect?
r

Renuka

05/27/2022, 7:36 PM
Haven't tried that.
k

Kevin Kho

05/27/2022, 7:39 PM
I am pretty confused because I feel it shouldn’t matter since the compute is happening on the Databricks side. I think if an API call without Prefect fails, it’s either authentication has to be passed (less likely), or the job needs some kind of configuration (more likely) in my opinion
r

Renuka

05/27/2022, 7:42 PM
That's what is not clear from the error on what's causing it. Yes like you said it's on the databricks side which shouldn't matter and act like a databricks user when the notebook runs. Not sure on what permission issue can be to run from prefect on the AWS side ?
k

Kevin Kho

05/27/2022, 7:48 PM
Is the setup using Prefect in the notebook or Prefect is on some VM starting the notebook?
r

Renuka

05/27/2022, 7:51 PM
Running prefect on my local registered to prefect cloud account and triggering the notebook from there. I am able to check this run from the databricks UI with logs.
k

Kevin Kho

05/27/2022, 7:56 PM
Yeah that’s very confusing, I think there really is just something different with the API call though I have no idea how to pass credentials. Maybe you need to supply the
access_control_list
? There is an example in the task docs.
r

Renuka

06/02/2022, 9:41 PM
I am actually passing databricks token and that's why it's able to trigger the databricks notebook. But it should run as regular databricks notebook with resources. Still can't figure it out about the s3 access issue
The databricks job was running as my user email , I tried to change the permissions of my user to be same as the role that databricks is using for s3 , but still didn't help.
k

Kevin Kho

06/03/2022, 3:26 PM
Let me ping our integrations team and see if they have ideas
r

Renuka

06/03/2022, 3:27 PM
Yes please , trying to figure this out from a while now. Any help is appreciated.
a

alex

06/03/2022, 3:29 PM
Hey @Renuka 👋 What’s the name of the S3 bucket that you’re trying to access?
r

Renuka

06/03/2022, 3:31 PM
It's our company specific , I can't share it.
a

alex

06/03/2022, 3:34 PM
OK, no worries! My recommendation would be to make sure that the bucket exists and that the Databricks instance that is running your notebook has access to that bucket. That permission management is not something that Prefect would have control over and would need to be setup before running your flow.
r

Renuka

06/03/2022, 3:37 PM
When I run the notebook within databricks without prefect it runs fine , so databricks has access to the bucket and also bucket exists, but when run from prefect as an api call, shouldn't it automatically be able to access it without any additional setup ? As prefect is just triggering the databricks notebook and hence should use same permissions that databricks has
a

alex

06/03/2022, 3:43 PM
Yeah, that all makes sense. What API endpoint are you calling as part of your flow?
r

Renuka

06/03/2022, 4:21 PM
I used the databricksubmitrun the first one in the databricks tasks document.
It worked actually after using existing clusterid in the api call. Thanks for your help. !!
k

Kevin Kho

06/06/2022, 8:35 PM
Oh glad you got that sorted out!