Joshua Greenhalgh

    Joshua Greenhalgh

    4 months ago
    Hi wonder if anyone could help me with a problem I have working with the Dask KubeCluster? So the issue I am having is that various secrets that I have mounted to the usual flow jobs don't get carried over to the pods that are started by dask - there is an added complexity that I am using two images a dev one and a non dev one tied to two different prefect projects - I am able to do something like this to switch the image;
    DEV_TAG = os.environ.get("DEV", "") != ""
    
    JOB_IMAGE_NAME = f"blah/flows{':dev' if DEV_TAG else ''}"
    And then in each flow I ref the
    JOB_IMAGE_NAME
    - this just changes the image but otherwise uses the job template I have defined on the agent;
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          containers:
            - name: flow
              imagePullPolicy: Always
              env:
                - name: SOME_ENV
                  valueFrom:
                    secretKeyRef:
                      name: secret-env-vars
                      key: some_env
                      optional: false
    Now when I specify the dask setup I do the following;
    executor=DaskExecutor(
            cluster_class=lambda: KubeCluster(make_pod_spec(image=JOB_IMAGE_NAME)),
            adapt_kwargs={"minimum": 2, "maximum": 3},
        )
    But this is obviously missing the env part of my default template - I would like to not have to respecify it (its much bigger then the above snippet) - is it possible to grab a handle on the default template and just override the image name?
    Anna Geller

    Anna Geller

    4 months ago
    are you on Prefect Cloud or Server? if you are on Cloud, you could leverage Prefect Secrets which would make the process much easier as you could set those directly from the Prefect Cloud UI
    Joshua Greenhalgh

    Joshua Greenhalgh

    4 months ago
    Hmm yeah unfortunately not allowed to store these there 😞
    Anna Geller

    Anna Geller

    4 months ago
    why? we are SOC-2 compliant
    you could also consider storing those in some other secrets manager you trust such as Hashicorp Vault or AWS secrets manager and retrieving those in your flow when needed
    Joshua Greenhalgh

    Joshua Greenhalgh

    4 months ago
    We just have a policy that all secrets must stay under our control - would be lots of bureaucracy to convince otherwise
    So I have actually worked out that since the Dask cluster is started within the job pod which has the secrets in its env I can just do this;
    DASK_POD_SPEC = make_pod_spec(
        image=JOB_IMAGE_NAME,
        env={
            "SECRET_ENV_VAR": os.environ['SECRET_ENV_VAR'],
        },
    )
    I did also have to do this;
    DASK_POD_SPEC.spec.service_account_name = "flow-user"
    Since the
    make_pod_spec
    doesn't allow to set the service account you want to run as
    Anna Geller

    Anna Geller

    4 months ago
    thanks for sharing - so your service account points to this environment variable?
    or the other way around?
    Joshua Greenhalgh

    Joshua Greenhalgh

    4 months ago
    No so two things; 1. Env vars are set on the pod created by the agent - since this pod creates the dask pods can just insert 2.
    service_account_name
    is needed so that I am able to access GCP resources -
    KubeCluster
    seems to default to using the
    default
    service account
    Anna Geller

    Anna Geller

    4 months ago
    I see, thanks for confirming that!