Is there a way to trigger databricks notebook from...
# ask-community
r
Is there a way to trigger databricks notebook from prefect 2.0 in the beta version ?
k
Not yet. It’s still being actively worked on.
r
Will it be similar if I test it in version 1 ?
k
Yeah but the plan is to make it better. You could copy the Python code under the hood for the one in Prefect 1.0
r
Can you point me to that code in the docs please ?
k
Click the source button here
r
I was able to run the databricks notebook from prefect but when I overwrite tables within the notebook I am having this error
When I run the notebook without prefect I don't have any issue
k
Looks to me like you are not authenticated to run the notebook?
r
The notebook is running fine successfully but at this step throws an error while saving to table
k
How do you run without Prefect? With an API call or running the notebook with Databricks UI?
r
I run using the databricks UI without any issue. When run from prefect as well it runs fine except for the step of table overwrite in the notebook
k
What about with an API call without Prefect?
r
Haven't tried that.
k
I am pretty confused because I feel it shouldn’t matter since the compute is happening on the Databricks side. I think if an API call without Prefect fails, it’s either authentication has to be passed (less likely), or the job needs some kind of configuration (more likely) in my opinion
r
That's what is not clear from the error on what's causing it. Yes like you said it's on the databricks side which shouldn't matter and act like a databricks user when the notebook runs. Not sure on what permission issue can be to run from prefect on the AWS side ?
k
Is the setup using Prefect in the notebook or Prefect is on some VM starting the notebook?
r
Running prefect on my local registered to prefect cloud account and triggering the notebook from there. I am able to check this run from the databricks UI with logs.
k
Yeah that’s very confusing, I think there really is just something different with the API call though I have no idea how to pass credentials. Maybe you need to supply the
access_control_list
? There is an example in the task docs.
r
I am actually passing databricks token and that's why it's able to trigger the databricks notebook. But it should run as regular databricks notebook with resources. Still can't figure it out about the s3 access issue
The databricks job was running as my user email , I tried to change the permissions of my user to be same as the role that databricks is using for s3 , but still didn't help.
k
Let me ping our integrations team and see if they have ideas
r
Yes please , trying to figure this out from a while now. Any help is appreciated.
a
Hey @Renuka 👋 What’s the name of the S3 bucket that you’re trying to access?
r
It's our company specific , I can't share it.
a
OK, no worries! My recommendation would be to make sure that the bucket exists and that the Databricks instance that is running your notebook has access to that bucket. That permission management is not something that Prefect would have control over and would need to be setup before running your flow.
r
When I run the notebook within databricks without prefect it runs fine , so databricks has access to the bucket and also bucket exists, but when run from prefect as an api call, shouldn't it automatically be able to access it without any additional setup ? As prefect is just triggering the databricks notebook and hence should use same permissions that databricks has
a
Yeah, that all makes sense. What API endpoint are you calling as part of your flow?
r
I used the databricksubmitrun the first one in the databricks tasks document.
It worked actually after using existing clusterid in the api call. Thanks for your help. !!
k
Oh glad you got that sorted out!