Hey! Following the above thread on secrets in k8s ...
# prefect-community
p
Hey! Following the above thread on secrets in k8s agent. How can ensure that prefect-jobs spawned by the agent contains the secret I specified? I'm running the Aircraft example from a notebook and connecting it to my server in the cluster. Simply passing image_pull_secrets to KubernetesRun do not work: keep getting the
Error: ErrImagePull
Copy code
custom_confs = {
    "run_config": KubernetesRun(
        image="drtools/prefect:aircraft-etl", 
        image_pull_secrets=["regcred"], 
    ),   
    "storage": S3(bucket="dr-prefect"),
} 

with Flow("Aircraft-ETL", **custom_confs) as flow:
    airport = Parameter("airport", default = "IAD")
    radius = Parameter("radius", default = 200)
    
    reference_data = extract_reference_data()
    live_data = extract_live_data(airport, radius, reference_data)

    transformed_live_data = transform(live_data, reference_data)

    load_reference_data(reference_data)
    load_live_data(transformed_live_data)
🙌 1
👀 2
Prefect Job description
Copy code
Name:         prefect-job-ded2fd39-k6kpp
Namespace:    default
Priority:     0
Node:         ****
Start Time:   Thu, 17 Dec 2020 15:20:15 -0300
Labels:       controller-uid=386ac185-8bba-47b4-85b0-358c3601179c
              job-name=prefect-job-ded2fd39
              <http://prefect.io/flow_id=3228aac5-a762-40db-9858-63c536ce5b8f|prefect.io/flow_id=3228aac5-a762-40db-9858-63c536ce5b8f>
              <http://prefect.io/flow_run_id=93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e|prefect.io/flow_run_id=93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e>
              <http://prefect.io/identifier=ded2fd39|prefect.io/identifier=ded2fd39>
Annotations:  <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:       Pending
IP:           10.0.1.16
IPs:
  IP:           10.0.1.16
Controlled By:  Job/prefect-job-ded2fd39
Containers:
  flow:
    Container ID:
    Image:         drtools/prefect:aircraft-etl
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      prefect
      execute
      flow-run
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:
      PREFECT__CLOUD__API:                          <http://prefect-server-apollo.default.svc.cluster.local:4200>
      PREFECT__CLOUD__AUTH_TOKEN:
      PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
      PREFECT__CONTEXT__FLOW_RUN_ID:                93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e
      PREFECT__CONTEXT__FLOW_ID:                    3228aac5-a762-40db-9858-63c536ce5b8f
      PREFECT__CONTEXT__IMAGE:                      drtools/prefect:aircraft-etl
      PREFECT__LOGGING__LOG_TO_CLOUD:               true
      PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
      PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-n28d2 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-n28d2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-n28d2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  9m29s                   default-scheduler  Successfully assigned default/prefect-job-ded2fd39-k6kpp to ip-10-0-1-20.eu-west-1.compute.internal
  Normal   Pulling    7m58s (x4 over 9m28s)   kubelet            Pulling image "drtools/prefect:aircraft-etl"
  Warning  Failed     7m57s (x4 over 9m28s)   kubelet            Failed to pull image "drtools/prefect:aircraft-etl": rpc error: code = Unknown desc = Error response from daemon: pull access denied for drtools/prefect, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     7m57s (x4 over 9m28s)   kubelet            Error: ErrImagePull
  Normal   BackOff    7m44s (x6 over 9m27s)   kubelet            Back-off pulling image "drtools/prefect:aircraft-etl"
  Warning  Failed     4m23s (x20 over 9m27s)  kubelet            Error: ImagePullBackOff
d
Hi @Pedro Martins, Have you set the secret in your k8s namespace? See https://docs.prefect.io/api/latest/run_configs.html#kubernetesrun for more details
p
Hey @Dylan! Yes I had.
Copy code
$ kubectl get secrets -n default
NAME                                        TYPE                                  DATA   AGE
aws-secret                                  Opaque                                2      3d1h
default-token-n28d2                         <http://kubernetes.io/service-account-token|kubernetes.io/service-account-token>   3      8d
prefect-server-postgresql                   Opaque                                1      6d21h
prefect-server-serviceaccount-token-lc6n2   <http://kubernetes.io/service-account-token|kubernetes.io/service-account-token>   3      6d21h
regcred                                     <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>        1      2d20h
sh.helm.release.v1.prefect-server.v1        <http://helm.sh/release.v1|helm.sh/release.v1>                    1      6d21h
Should this environment varible
PREFECT__CLOUD__USE_LOCAL_SECRETS: false
be set to true?
d
No, that’s for Prefect secrets
Hmmm
j
Do you have the latest version of prefect running on your agent? Older versions of the agent won't forward the
image_pull_secrets
field.
upvote 1
p
Yes @Jim Crist-Harif! I running on the brand new 'prefecthq/prefect:0.14.0-python3.6'
j
Hmmm, I'm unable to reproduce. The job spec generated for me using your provided run-config is:
Copy code
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    <http://prefect.io/flow_id|prefect.io/flow_id>: new_id
    <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: id
    <http://prefect.io/identifier|prefect.io/identifier>: 453321ca
  name: prefect-job-453321ca
spec:
  template:
    imagePullSecrets:
    - name: regcred
    metadata:
      labels:
        <http://prefect.io/flow_id|prefect.io/flow_id>: new_id
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: id
        <http://prefect.io/identifier|prefect.io/identifier>: 453321ca
    spec:
      containers:
      - args:
        - prefect
        - execute
        - flow-run
        env:
        - name: PREFECT__CLOUD__API
          value: <https://api.prefect.io>
        - name: PREFECT__CLOUD__AUTH_TOKEN
          value: <redacted>
        - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
          value: 'false'
        - name: PREFECT__CONTEXT__FLOW_RUN_ID
          value: id
        - name: PREFECT__CONTEXT__FLOW_ID
          value: new_id
        - name: PREFECT__CONTEXT__IMAGE
          value: drtools/prefect:aircraft-etl
        - name: PREFECT__LOGGING__LOG_TO_CLOUD
          value: 'true'
        - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudFlowRunner
        - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudTaskRunner
        image: drtools/prefect:aircraft-etl
        name: flow
        resources:
          limits: {}
          requests: {}
      restartPolicy: Never
what's the output of
kubectl get job <your-job-id> -o yaml
?
p
Copy code
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2020-12-17T18:25:53Z"
  labels:
    <http://prefect.io/flow_id|prefect.io/flow_id>: 1d0ff4aa-da07-4309-82c5-d96f05502a03
    <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 2529c19e-0e6c-428f-b777-54b04d19fb9f
    <http://prefect.io/identifier|prefect.io/identifier>: 93e105ba
  name: prefect-job-93e105ba
  namespace: default
  resourceVersion: "2330069"
  selfLink: /apis/batch/v1/namespaces/default/jobs/prefect-job-93e105ba
  uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
        job-name: prefect-job-93e105ba
        <http://prefect.io/flow_id|prefect.io/flow_id>: 1d0ff4aa-da07-4309-82c5-d96f05502a03
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 2529c19e-0e6c-428f-b777-54b04d19fb9f
        <http://prefect.io/identifier|prefect.io/identifier>: 93e105ba
    spec:
      containers:
      - args:
        - prefect
        - execute
        - flow-run
        env:
        - name: PREFECT__CLOUD__API
          value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
        - name: PREFECT__CLOUD__AUTH_TOKEN
        - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
          value: "false"
        - name: PREFECT__CONTEXT__FLOW_RUN_ID
          value: 2529c19e-0e6c-428f-b777-54b04d19fb9f
        - name: PREFECT__CONTEXT__FLOW_ID
          value: 1d0ff4aa-da07-4309-82c5-d96f05502a03
        - name: PREFECT__CONTEXT__IMAGE
          value: drtools/prefect:aircraft-etl
        - name: PREFECT__LOGGING__LOG_TO_CLOUD
          value: "true"
        - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudFlowRunner
        - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudTaskRunner
        image: drtools/prefect:aircraft-etl
        imagePullPolicy: IfNotPresent
        name: flow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  active: 1
  startTime: "2020-12-17T18:25:53Z"
I'm running own server - deployed on k8s. Would that change anything?
j
It shouldn't.
I know you said you're running the k8s agent on 0.14.0, but can you triple check that? If it's deployed as a pod, can you verify the image is using 0.14.0-python3.6? And can you check that you don't have an older agent running somewhere else that might have submitted that job instead?
p
Yes! I'm cautiously checking all versions...
Let you know in a moment
@Jim Crist-Harif I took some time to look deep into this but I'm out of ideas already. Both client and agent are running with the latest version of prefect. I set the
IMAGE_PULL_SECRETS
variable on the agent and it doesn't pass to the pods.
This is the description of the agent
Copy code
apiVersion: v1
kind: Pod
metadata:
  annotations:
    <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
  creationTimestamp: "2020-12-17T22:17:46Z"
  generateName: prefect-agent-545bccd6c8-
  labels:
    app: prefect-agent
    pod-template-hash: 545bccd6c8
  name: prefect-agent-545bccd6c8-rqmg8
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: prefect-agent-545bccd6c8
    uid: 870f927b-75d5-4da1-95aa-963b936ff204
  resourceVersion: "2377225"
  selfLink: /api/v1/namespaces/default/pods/prefect-agent-545bccd6c8-rqmg8
  uid: 69e10ea0-9ff4-434c-bc71-9ec6b085e3fa
spec:
  containers:
  - args:
    - prefect agent kubernetes start
    command:
    - /bin/bash
    - -c
    env:
    - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
    - name: PREFECT__CLOUD__API
      value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
    - name: NAMESPACE
      value: default
    - name: IMAGE_PULL_SECRETS
      value: regcred
    - name: PREFECT__CLOUD__AGENT__LABELS
      value: '[]'
    - name: JOB_MEM_REQUEST
    - name: JOB_MEM_LIMIT
    - name: JOB_CPU_REQUEST
    - name: JOB_CPU_LIMIT
    - name: IMAGE_PULL_POLICY
    - name: SERVICE_ACCOUNT_NAME
    - name: PREFECT__BACKEND
      value: server
    - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
      value: http://:8080
    - name: PREFECT__CLOUD__AGENT__LEVEL
      value: DEBUG
    image: prefecthq/prefect:0.14.0-python3.6
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 2
      httpGet:
        path: /api/health
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 40
      periodSeconds: 40
      successThreshold: 1
      timeoutSeconds: 1
    name: agent
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-n28d2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-0-1-20.eu-west-1.compute.internal
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-n28d2
    secret:
      defaultMode: 420
      secretName: default-token-n28d2
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T22:17:46Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T22:17:49Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T22:17:49Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T22:17:46Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: <docker://cf748aa6b79bcc1d1aaa0b39eda0a0c07342a6d1a39e51637d11c6f89fbdb6b2>
    image: prefecthq/prefect:0.14.0-python3.6
    imageID: <docker-pullable://prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
    lastState: {}
    name: agent
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-12-17T22:17:48Z"
  hostIP: 10.0.1.20
  phase: Running
  podIP: 10.0.1.54
  podIPs:
  - ip: 10.0.1.54
  qosClass: Guaranteed
  startTime: "2020-12-17T22:17:46Z"
And this is from the job spawned by the agent. Unlike yours, it is not adding the imagePullSecrets section
Copy code
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2020-12-17T22:19:52Z"
  labels:
    <http://prefect.io/flow_id|prefect.io/flow_id>: 0320c90d-56c0-40b1-a259-75ef587d24e3
    <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
    <http://prefect.io/identifier|prefect.io/identifier>: ba7cd008
  name: prefect-job-ba7cd008
  namespace: default
  resourceVersion: "2377667"
  selfLink: /apis/batch/v1/namespaces/default/jobs/prefect-job-ba7cd008
  uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
        job-name: prefect-job-ba7cd008
        <http://prefect.io/flow_id|prefect.io/flow_id>: 0320c90d-56c0-40b1-a259-75ef587d24e3
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
        <http://prefect.io/identifier|prefect.io/identifier>: ba7cd008
    spec:
      containers:
      - args:
        - prefect
        - execute
        - flow-run
        env:
        - name: PREFECT__CLOUD__API
          value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
        - name: PREFECT__CLOUD__AUTH_TOKEN
        - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
          value: "false"
        - name: PREFECT__CONTEXT__FLOW_RUN_ID
          value: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
        - name: PREFECT__CONTEXT__FLOW_ID
          value: 0320c90d-56c0-40b1-a259-75ef587d24e3
        - name: PREFECT__CONTEXT__IMAGE
          value: drtools/prefect:aircraft-etl
        - name: PREFECT__LOGGING__LOG_TO_CLOUD
          value: "true"
        - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudFlowRunner
        - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
          value: prefect.engine.cloud.CloudTaskRunner
        image: drtools/prefect:aircraft-etl
        imagePullPolicy: IfNotPresent
        name: flow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  active: 1
  startTime: "2020-12-17T22:19:52Z"
and this is the log of the agent
Copy code
[2020-12-17 22:19:52,076] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-12-17 22:19:52,079] DEBUG - agent | Updating states for flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2
[2020-12-17 22:19:52,096] DEBUG - agent | Flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2 is in a Scheduled state, updating to Submitted
[2020-12-17 22:19:52,110] DEBUG - agent | Next query for flow runs in 0.25 seconds
[2020-12-17 22:19:52,236] INFO - agent | Deploying flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2
[2020-12-17 22:19:52,238] DEBUG - agent | Loading job template from '/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/job_template.yaml'
[2020-12-17 22:19:52,298] DEBUG - agent | Creating namespaced job prefect-job-ba7cd008
[2020-12-17 22:19:52,317] DEBUG - agent | Job prefect-job-ba7cd008 created
[2020-12-17 22:19:52,360] DEBUG - agent | Querying for flow runs
[2020-12-17 22:19:52,476] DEBUG - agent | Completed flow run submission (id: 34028fb2-2cd2-4cf4-88e1-82d983c650b2)
[2020-12-17 22:19:52,508] DEBUG - agent | No flow runs found
[2020-12-17 22:19:52,510] DEBUG - agent | Next query for flow runs in 0.5 seconds
[2020-12-17 22:19:53,010] DEBUG - agent | Querying for flow runs
[2020-12-17 22:19:53,067] DEBUG - agent | No flow runs found
[2020-12-17 22:19:53,072] DEBUG - agent | Next query for flow runs in 1.0 seconds
[2020-12-17 22:19:54,072] DEBUG - agent | Querying for flow runs
[2020-12-17 22:19:54,105] DEBUG - agent | No flow runs found
[2020-12-17 22:19:54,106] DEBUG - agent | Next query for flow runs in 2.0 seconds
[2020-12-17 22:19:56,106] DEBUG - agent | Querying for flow runs
[2020-12-17 22:19:56,148] DEBUG - agent | No flow runs found
[2020-12-17 22:19:56,148] DEBUG - agent | Next query for flow runs in 4.0 seconds
[2020-12-17 22:19:59,582] DEBUG - agent | Running agent heartbeat...
[2020-12-17 22:19:59,582] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
[2020-12-17 22:19:59,590] DEBUG - agent | Deleting job prefect-job-37fb2fd1
[2020-12-17 22:19:59,616] DEBUG - agent | Failing flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2 due to pod ErrImagePull
[2020-12-17 22:19:59,675] ERROR - agent | Error while managing existing k8s jobs
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 357, in heartbeat
    self.manage_jobs()
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 215, in manage_jobs
    pod_events.items, key=lambda x: x.last_timestamp
TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'
[2020-12-17 22:19:59,714] DEBUG - agent | Sleeping heartbeat for 60.0 seconds
[2020-12-17 22:20:00,149] DEBUG - agent | Querying for flow runs
[2020-12-17 22:20:00,197] DEBUG - agent | No flow runs found
[2020-12-17 22:20:00,198] DEBUG - agent | Next query for flow runs in 8.0 seconds
j
Hmmm, ok. Thanks for all the info! This is helpful (looks like you've also found another unrelated bug with that error log :))
One last question - can you post
flow.diagnostics()
for your flow?
(diagnostics is a method on the flow)
p
(looks like you've also found another unrelated bug with that error log :))
yeah! that error might be because it cannot connect to the pod it tried to create 🤷‍♂️
Ok! One moment
Copy code
{
  "config_overrides": {},
  "env_vars": [],
  "flow_information": {
    "environment": false,
    "result": false,
    "run_config": {
      "cpu_limit": false,
      "cpu_request": false,
      "env": false,
      "image": true,
      "image_pull_secrets": true,
      "job_template": false,
      "job_template_path": false,
      "labels": false,
      "memory_limit": false,
      "memory_request": false,
      "service_account_name": false,
      "type": "KubernetesRun"
    },
    "schedule": false,
    "storage": {
      "_flows": false,
      "_labels": false,
      "add_default_labels": true,
      "bucket": true,
      "client_options": false,
      "flows": false,
      "key": false,
      "local_script_path": false,
      "result": true,
      "secrets": false,
      "stored_as_script": false,
      "type": "S3"
    },
    "task_count": 7
  },
  "system_information": {
    "platform": "Linux-4.14.203-156.332.amzn2.x86_64-x86_64-with-glibc2.10",
    "prefect_backend": "server",
    "prefect_version": "0.14.0",
    "python_version": "3.8.6"
  }
}
j
cool, thanks! I'll try to take a look at this tomorrow - this looks like a bug. Thanks for working through this with me.
n
@Jim Crist-Harif and @Pedro Martins I have exactly the same two bugs described here: the
TypeError
and no imagePullSecrets. I'm reinstalling prefect using conda as well, trying to see if there's any dependency not automatically update when installing prefect that is causing this discrepancy. Thanks for all the debugging
reinstalled and triple checked prefect and k8s agent versions, all in 0.14.0, it doesn't create imagePullSecrets in the job image and I still get the pull image error. Curious what you find. Thanks
p
@Jim Crist-Harif @Dylan I dig deep into the prefect code to understand what is going on with the
imagePullSecrets
tag. The Kubernetes Agent
deploy_flow
actually creates the job specification with the secret:
Copy code
{'apiVersion': 'batch/v1',
 'kind': 'Job',
 'spec': {'template': {'spec': {'containers': [{'name': 'flow',
      'image': 'drtools/prefect:aircraft-etl',
      'args': ['prefect', 'execute', 'cloud-flow'],
      'env': [{'name': 'PREFECT__CLOUD__API',
        'value': 'http://****:4200'},
       {'name': 'PREFECT__CLOUD__AUTH_TOKEN', 'value': ''},
       {'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'},
       {'name': 'PREFECT__CONTEXT__FLOW_RUN_ID',
        'value': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
       {'name': 'PREFECT__CONTEXT__FLOW_ID',
        'value': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
       {'name': 'PREFECT__CONTEXT__IMAGE',
        'value': 'drtools/prefect:aircraft-etl'},
       {'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'},
       {'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS',
        'value': 'prefect.engine.cloud.CloudFlowRunner'},
       {'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS',
        'value': 'prefect.engine.cloud.CloudTaskRunner'}],
      'resources': {'requests': {}, 'limits': {}}}],
    'restartPolicy': 'Never'},
   'metadata': {'labels': {'<http://prefect.io/identifier|prefect.io/identifier>': 'fb944cb5',
     '<http://prefect.io/flow_run_id|prefect.io/flow_run_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60',
     '<http://prefect.io/flow_id|prefect.io/flow_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'}},
   'imagePullSecrets': [{'name': 'regcred'}]}},
 'metadata': {'labels': {'<http://prefect.io/identifier|prefect.io/identifier>': 'fb944cb5',
   '<http://prefect.io/flow_run_id|prefect.io/flow_run_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60',
   '<http://prefect.io/flow_id|prefect.io/flow_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
  'name': 'prefect-job-fb944cb5'}}
Then it calls
self.batch_client.create_namespaced_job
. There are some sanitization in the payload but they don't remove the pull secret from the body. When it calls the kubernetes api
self.api_client.call_api
the body is complete! However the job specification that reaches the cluster doesn't contain the secret. It gets lost in the way or it is removed in the cluster api server. Are you aware of some API incompatibility here?
@Jim Crist-Harif @Dylan I actually found the problem guys! The kubernetes agent
generate_job_spec_from_run_config
is adding the secret in the wrong level. The secret should be added to the same level of container specification. The fix should be this:
Copy code
pod_template["spec"]["imagePullSecrets"] = [{"name": s} for s in image_pull_secrets]
https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/kubernetes/agent.py#L623
j
Ah, nice catch. I'll push a fix up today with this change. Thanks!
1
Done! Thanks for finding the issue, this will be out in the next release (either tomorrow or wed) https://github.com/PrefectHQ/prefect/pull/3884
🙌 1
a
Hey! I think i’m facing the same issue (?). Getting the following error
Copy code
Event: 'Failed' on pod 'prefect-job-42420d16-bn54h'
	Message: Error: ErrImagePull
I have a prefect server running on kubernetes, which I installed using the helm chart available here - https://github.com/PrefectHQ/server/tree/master/helm/prefect-server. Tried two things - pass the arg
image_pull_secrets
to
KubernetesRun()
and tried editing the k8s deployment of agent to have the correct secret
Copy code
IMAGE_PULL_SECRETS: [vi-dockerhub-key]
Neither worked for me and I could see that pod does not have the pull secrets in its description. Also the above secret is in
default
namespace. Since, the above issue seem to be fixed, am I missing something trivial/obvious?