https://prefect.io logo
Title
m

Mateo Merlo

05/23/2022, 2:11 PM
Hi!! Anybody has issues using credentials in DbtShellTask in Prefect Cloud? I'm working with BigQuery. Do I need to pass the "keyfile" param inside dbt_kwargs? Or if I have a GCP_CREDENTIALS Secret this is not need?
a

Anna Geller

05/23/2022, 2:18 PM
Here is one way of approaching it using Prefect Secrets and `dbt_kwargs`:
postgres_user = PrefectSecret("POSTGRES_USER")
    postgres_pass = PrefectSecret("POSTGRES_PASS")
    db_credentials = get_dbt_credentials(postgres_user, postgres_pass)
    dbt_run = dbt(
        command="dbt run", task_args={"name": "DBT Run"}, dbt_kwargs=db_credentials
    )
full example: https://www.prefect.io/blog/flow-of-flows-orchestrating-elt-with-prefect-and-dbt/
GCP_CREDENTIALS
can be tackled the same way
m

Mateo Merlo

05/23/2022, 2:27 PM
Thanks @Anna Geller. Something like this:
@task
def get_dbt_credentials(gcp_json):
    return {"keyfile": gcp_json}

@task
def format_command(target_env):
    return f"dbt build --target {target_env}"

with Flow("DBT_flow") as flow:
    target_env = Parameter("target_env", required=True)
    credentials = PrefectSecret("GCP_CREDENTIALS")
    db_credentials = get_dbt_credentials(credentials)
    dbt_run = dbt(
        command=format_command(target_env),
        task_args={"name": "DBT Run"},
        dbt_kwargs=db_credentials
    )
I will try it.
a

Anna Geller

05/23/2022, 2:29 PM
this looks good - the only issue you need to be careful about is whether dbt expects the contents of the GCP service account or just a path to it - the Prefect Secret GCP_CREDENTIALS expects the contents of a SA
m

Mateo Merlo

05/23/2022, 2:30 PM
Yes, that is exactly the issue, because I know that DBT require a path
Should I create a Docker image with this file inside and use an ENV variable in the Dockerfile?
we are working with a custom image to run the flows with Vertex
a

Anna Geller

05/23/2022, 2:36 PM
I can't dive deeper into this atm but this repo may help you - I did a flow of flows example with dbt + GCP here - using a local agent but this should work the same way with Docker agent https://github.com/anna-geller/prefect-monte-carlo/tree/workshop/flows I didn't have to specify any credentials other than setting the SA path on the dbt profile
m

Mateo Merlo

05/23/2022, 2:44 PM
Sure, Thanks! πŸ˜’imple_smile:
a

Anna Geller

05/23/2022, 2:47 PM
please keep us posted here about your progress - I think using the same approach setting the path on your dbt profile and coping or mounting the SA file to your container should work
m

Mateo Merlo

05/23/2022, 2:47 PM
In case I get the secret from Prefect Cloud, should I be able to paste this content in a new file, right? I think that will be easier to do. And then use that path file as the keyfile param.
a

Anna Geller

05/23/2022, 2:50 PM
dbt seems to support multiple auth methods so you'd need to experiment which option works best for you, hard to say, sorry I'm not more useful here
m

Mateo Merlo

05/23/2022, 2:51 PM
Yes, the main trade off with mounting the file in the Docker image is that probably is going to work when I build the image locally (where I have the SA file), but we are using cloudbuild to build and push the image to Google Registry, so in that case we will probably need to use AWS Secrets Manager, and I honestly I have not to much idea about it
no problem Anna! Thanks for your help!
πŸ‘ 1
j

Joshua Greenhalgh

05/23/2022, 3:15 PM
@Mateo Merlo where are you running your flows and agent?
You can it seems specify the individual parts of the keyfile directly in the profile.yml -> https://prefect-community.slack.com/archives/CL09KU1K7/p1650625646905089?thread_ts=1650371481.279719&cid=CL09KU1K7 So maybe you could template in the secret bits from prefect secrets? I am running DBT on k8s - I created a k8s secret file and then mounted that into the pods environment via the agent default template - if you are running on k8s would be happy to discuss how I did that
m

Mateo Merlo

05/23/2022, 3:24 PM
Thanks for your response @Joshua Greenhalgh! Yes, I saw your thread as well.
We are using Vertex and docker.
flow.storage = GCS(bucket="influsense_test_bucket", secrets=["GCP_CREDENTIALS"])
flow.executor = LocalDaskExecutor(scheduler="threads")
flow.run_config = VertexRun(
            image="europe-west1-docker.pkg.dev/bi-staging-1-309112/my-image/my-image:latest",
            labels=["vertex"],
)
Google Cloud Storage
Did you solve it using the individual parts? or just getting the entire file from k8s secret?
j

Joshua Greenhalgh

05/23/2022, 3:27 PM
I just mounted the entire file
m

Mateo Merlo

05/23/2022, 3:29 PM
Would you mind to share how you did it? I've already been working with Kubernetes and running the agent but I've never used secrets
j

Joshua Greenhalgh

05/23/2022, 3:30 PM
Of course
Is the terraform ok?
m

Mateo Merlo

05/23/2022, 3:31 PM
Sure
j

Joshua Greenhalgh

05/23/2022, 3:34 PM
data "sops_file" "dbt_bq_user_service_account_key" {
  source_file = "./secret_data/dbt-bq-user-sa-key.json"
}

resource "kubernetes_secret" "dbt_bq_user_service_account_key" {
  metadata {
    name      = "dbt-bq-user-service-account-key"
    namespace = "prefect"
  }

  data = { "dbt-bq-user-sa-key.json" = jsonencode(data.sops_file.dbt_bq_user_service_account_key.data) }
}
The
sops_file
bit is just so I can keep the file in a git repo encrypted - basicly
data.sops_file.dbt_bq_user_service_account_key
is the full key for the service account... The I have the following agent template;
apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
        - name: flow
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /service_account_key/
              name: dbt-bq-user-service-account-key
      volumes:
        - name: dbt-bq-user-service-account-key
          secret:
            secretName: dbt-bq-user-service-account-key
And my profile which is baked into my image looks like;
mbk_dwh:
  outputs:
    dev:
      dataset: mbk_dwh
      fixed_retries: 1
      location: EU
      method: service-account
      priority: interactive
      project: data-analytics-mbk
      threads: 1
      timeout_seconds: 300
      type: bigquery
      keyfile: /service_account_key/dbt-bq-user-sa-key.json
m

Mateo Merlo

05/23/2022, 3:46 PM
Thanks so much!
I'm currently using AWS Codebuild to build and push the docker image, so I will probably need to do something similar with AWS Secrets Manager to build the image from the cd pipeline
you run all the terraform commands locally, right?
How do you point to this file in the KubernetesRunConfig?
j

Joshua Greenhalgh

05/23/2022, 3:55 PM
I don't need to specify anything in the run config since it just exists on the pod at
/service_account_key/dbt-bq-user-sa-key.json
- all I have to do is bake the profile into the image with a ref to that location
I try to run the terraform on CI most of the time...doesn't always work that way however
m

Mateo Merlo

05/23/2022, 3:59 PM
I mean, how do you say to Prefect that the flow show use that image?
j

Joshua Greenhalgh

05/23/2022, 4:00 PM
Ahh ok;
run_config=KubernetesRun(image=JOB_IMAGE_NAME, labels=[DEFAULT_AGENT_LABEL]),
I am always just using
latest
m

Mateo Merlo

05/23/2022, 4:03 PM
Thanks so much @Joshua Greenhalgh! I will try first work with the individual parts of the keyfile and if I have not success then I would move to Kubernetes
I think that use a task that read the GCP_CREDENTIALS variable and create a new file with that content can be a good approach as well
And then use that as the keyfile inside profiles.yml
j

Joshua Greenhalgh

05/23/2022, 4:09 PM
So I started out using vertex but its pretty expensive - if you are only running one gke autopilot cluster the flat fee is discounted so you only pay for the compute you use
k8s is complicated of course but I don't think its insurmountable
πŸ‘€ 1
m

Mateo Merlo

05/23/2022, 5:02 PM
Totally agree. With one cluster you mean the agent that will be running there, right?