Thread
#prefect-community
    Pedro Martins

    Pedro Martins

    1 year ago
    Hey! Following the above thread on secrets in k8s agent. How can ensure that prefect-jobs spawned by the agent contains the secret I specified? I'm running the Aircraft example from a notebook and connecting it to my server in the cluster. Simply passing image_pull_secrets to KubernetesRun do not work: keep getting the
    Error: ErrImagePull
    custom_confs = {
        "run_config": KubernetesRun(
            image="drtools/prefect:aircraft-etl", 
            image_pull_secrets=["regcred"], 
        ),   
        "storage": S3(bucket="dr-prefect"),
    } 
    
    with Flow("Aircraft-ETL", **custom_confs) as flow:
        airport = Parameter("airport", default = "IAD")
        radius = Parameter("radius", default = 200)
        
        reference_data = extract_reference_data()
        live_data = extract_live_data(airport, radius, reference_data)
    
        transformed_live_data = transform(live_data, reference_data)
    
        load_reference_data(reference_data)
        load_live_data(transformed_live_data)
    Prefect Job description
    Name:         prefect-job-ded2fd39-k6kpp
    Namespace:    default
    Priority:     0
    Node:         ****
    Start Time:   Thu, 17 Dec 2020 15:20:15 -0300
    Labels:       controller-uid=386ac185-8bba-47b4-85b0-358c3601179c
                  job-name=prefect-job-ded2fd39
                  <http://prefect.io/flow_id=3228aac5-a762-40db-9858-63c536ce5b8f|prefect.io/flow_id=3228aac5-a762-40db-9858-63c536ce5b8f>
                  <http://prefect.io/flow_run_id=93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e|prefect.io/flow_run_id=93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e>
                  <http://prefect.io/identifier=ded2fd39|prefect.io/identifier=ded2fd39>
    Annotations:  <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
    Status:       Pending
    IP:           10.0.1.16
    IPs:
      IP:           10.0.1.16
    Controlled By:  Job/prefect-job-ded2fd39
    Containers:
      flow:
        Container ID:
        Image:         drtools/prefect:aircraft-etl
        Image ID:
        Port:          <none>
        Host Port:     <none>
        Args:
          prefect
          execute
          flow-run
        State:          Waiting
          Reason:       ImagePullBackOff
        Ready:          False
        Restart Count:  0
        Environment:
          PREFECT__CLOUD__API:                          <http://prefect-server-apollo.default.svc.cluster.local:4200>
          PREFECT__CLOUD__AUTH_TOKEN:
          PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
          PREFECT__CONTEXT__FLOW_RUN_ID:                93c58ae5-1bc4-4a3c-bb70-7bb6a50ff10e
          PREFECT__CONTEXT__FLOW_ID:                    3228aac5-a762-40db-9858-63c536ce5b8f
          PREFECT__CONTEXT__IMAGE:                      drtools/prefect:aircraft-etl
          PREFECT__LOGGING__LOG_TO_CLOUD:               true
          PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
          PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-n28d2 (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             False
      ContainersReady   False
      PodScheduled      True
    Volumes:
      default-token-n28d2:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-n28d2
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                     <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
    Events:
      Type     Reason     Age                     From               Message
      ----     ------     ----                    ----               -------
      Normal   Scheduled  9m29s                   default-scheduler  Successfully assigned default/prefect-job-ded2fd39-k6kpp to ip-10-0-1-20.eu-west-1.compute.internal
      Normal   Pulling    7m58s (x4 over 9m28s)   kubelet            Pulling image "drtools/prefect:aircraft-etl"
      Warning  Failed     7m57s (x4 over 9m28s)   kubelet            Failed to pull image "drtools/prefect:aircraft-etl": rpc error: code = Unknown desc = Error response from daemon: pull access denied for drtools/prefect, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
      Warning  Failed     7m57s (x4 over 9m28s)   kubelet            Error: ErrImagePull
      Normal   BackOff    7m44s (x6 over 9m27s)   kubelet            Back-off pulling image "drtools/prefect:aircraft-etl"
      Warning  Failed     4m23s (x20 over 9m27s)  kubelet            Error: ImagePullBackOff
    Dylan

    Dylan

    1 year ago
    Hi @Pedro Martins, Have you set the secret in your k8s namespace? See https://docs.prefect.io/api/latest/run_configs.html#kubernetesrun for more details
    Pedro Martins

    Pedro Martins

    1 year ago
    Hey @Dylan! Yes I had.
    $ kubectl get secrets -n default
    NAME                                        TYPE                                  DATA   AGE
    aws-secret                                  Opaque                                2      3d1h
    default-token-n28d2                         <http://kubernetes.io/service-account-token|kubernetes.io/service-account-token>   3      8d
    prefect-server-postgresql                   Opaque                                1      6d21h
    prefect-server-serviceaccount-token-lc6n2   <http://kubernetes.io/service-account-token|kubernetes.io/service-account-token>   3      6d21h
    regcred                                     <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>        1      2d20h
    sh.helm.release.v1.prefect-server.v1        <http://helm.sh/release.v1|helm.sh/release.v1>                    1      6d21h
    Should this environment varible
    PREFECT__CLOUD__USE_LOCAL_SECRETS: false
    be set to true?
    Dylan

    Dylan

    1 year ago
    No, thatโ€™s for Prefect secrets
    Hmmm
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Do you have the latest version of prefect running on your agent? Older versions of the agent won't forward the
    image_pull_secrets
    field.
    Pedro Martins

    Pedro Martins

    1 year ago
    Yes @Jim Crist-Harif! I running on the brand new 'prefecthq/prefect:0.14.0-python3.6'
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Hmmm, I'm unable to reproduce. The job spec generated for me using your provided run-config is:
    apiVersion: batch/v1
    kind: Job
    metadata:
      labels:
        <http://prefect.io/flow_id|prefect.io/flow_id>: new_id
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: id
        <http://prefect.io/identifier|prefect.io/identifier>: 453321ca
      name: prefect-job-453321ca
    spec:
      template:
        imagePullSecrets:
        - name: regcred
        metadata:
          labels:
            <http://prefect.io/flow_id|prefect.io/flow_id>: new_id
            <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: id
            <http://prefect.io/identifier|prefect.io/identifier>: 453321ca
        spec:
          containers:
          - args:
            - prefect
            - execute
            - flow-run
            env:
            - name: PREFECT__CLOUD__API
              value: <https://api.prefect.io>
            - name: PREFECT__CLOUD__AUTH_TOKEN
              value: <redacted>
            - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
              value: 'false'
            - name: PREFECT__CONTEXT__FLOW_RUN_ID
              value: id
            - name: PREFECT__CONTEXT__FLOW_ID
              value: new_id
            - name: PREFECT__CONTEXT__IMAGE
              value: drtools/prefect:aircraft-etl
            - name: PREFECT__LOGGING__LOG_TO_CLOUD
              value: 'true'
            - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudFlowRunner
            - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudTaskRunner
            image: drtools/prefect:aircraft-etl
            name: flow
            resources:
              limits: {}
              requests: {}
          restartPolicy: Never
    what's the output of
    kubectl get job <your-job-id> -o yaml
    ?
    Pedro Martins

    Pedro Martins

    1 year ago
    apiVersion: batch/v1
    kind: Job
    metadata:
      creationTimestamp: "2020-12-17T18:25:53Z"
      labels:
        <http://prefect.io/flow_id|prefect.io/flow_id>: 1d0ff4aa-da07-4309-82c5-d96f05502a03
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 2529c19e-0e6c-428f-b777-54b04d19fb9f
        <http://prefect.io/identifier|prefect.io/identifier>: 93e105ba
      name: prefect-job-93e105ba
      namespace: default
      resourceVersion: "2330069"
      selfLink: /apis/batch/v1/namespaces/default/jobs/prefect-job-93e105ba
      uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
    spec:
      backoffLimit: 6
      completions: 1
      parallelism: 1
      selector:
        matchLabels:
          controller-uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
      template:
        metadata:
          creationTimestamp: null
          labels:
            controller-uid: f0246452-fcb8-41e5-b9a8-b816a5ec9a96
            job-name: prefect-job-93e105ba
            <http://prefect.io/flow_id|prefect.io/flow_id>: 1d0ff4aa-da07-4309-82c5-d96f05502a03
            <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 2529c19e-0e6c-428f-b777-54b04d19fb9f
            <http://prefect.io/identifier|prefect.io/identifier>: 93e105ba
        spec:
          containers:
          - args:
            - prefect
            - execute
            - flow-run
            env:
            - name: PREFECT__CLOUD__API
              value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
            - name: PREFECT__CLOUD__AUTH_TOKEN
            - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
              value: "false"
            - name: PREFECT__CONTEXT__FLOW_RUN_ID
              value: 2529c19e-0e6c-428f-b777-54b04d19fb9f
            - name: PREFECT__CONTEXT__FLOW_ID
              value: 1d0ff4aa-da07-4309-82c5-d96f05502a03
            - name: PREFECT__CONTEXT__IMAGE
              value: drtools/prefect:aircraft-etl
            - name: PREFECT__LOGGING__LOG_TO_CLOUD
              value: "true"
            - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudFlowRunner
            - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudTaskRunner
            image: drtools/prefect:aircraft-etl
            imagePullPolicy: IfNotPresent
            name: flow
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    status:
      active: 1
      startTime: "2020-12-17T18:25:53Z"
    I'm running own server - deployed on k8s. Would that change anything?
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    It shouldn't.
    I know you said you're running the k8s agent on 0.14.0, but can you triple check that? If it's deployed as a pod, can you verify the image is using 0.14.0-python3.6? And can you check that you don't have an older agent running somewhere else that might have submitted that job instead?
    Pedro Martins

    Pedro Martins

    1 year ago
    Yes! I'm cautiously checking all versions...
    Let you know in a moment
    @Jim Crist-Harif I took some time to look deep into this but I'm out of ideas already. Both client and agent are running with the latest version of prefect. I set the
    IMAGE_PULL_SECRETS
    variable on the agent and it doesn't pass to the pods.
    This is the description of the agent
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
      creationTimestamp: "2020-12-17T22:17:46Z"
      generateName: prefect-agent-545bccd6c8-
      labels:
        app: prefect-agent
        pod-template-hash: 545bccd6c8
      name: prefect-agent-545bccd6c8-rqmg8
      namespace: default
      ownerReferences:
      - apiVersion: apps/v1
        blockOwnerDeletion: true
        controller: true
        kind: ReplicaSet
        name: prefect-agent-545bccd6c8
        uid: 870f927b-75d5-4da1-95aa-963b936ff204
      resourceVersion: "2377225"
      selfLink: /api/v1/namespaces/default/pods/prefect-agent-545bccd6c8-rqmg8
      uid: 69e10ea0-9ff4-434c-bc71-9ec6b085e3fa
    spec:
      containers:
      - args:
        - prefect agent kubernetes start
        command:
        - /bin/bash
        - -c
        env:
        - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
        - name: PREFECT__CLOUD__API
          value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
        - name: NAMESPACE
          value: default
        - name: IMAGE_PULL_SECRETS
          value: regcred
        - name: PREFECT__CLOUD__AGENT__LABELS
          value: '[]'
        - name: JOB_MEM_REQUEST
        - name: JOB_MEM_LIMIT
        - name: JOB_CPU_REQUEST
        - name: JOB_CPU_LIMIT
        - name: IMAGE_PULL_POLICY
        - name: SERVICE_ACCOUNT_NAME
        - name: PREFECT__BACKEND
          value: server
        - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          value: http://:8080
        - name: PREFECT__CLOUD__AGENT__LEVEL
          value: DEBUG
        image: prefecthq/prefect:0.14.0-python3.6
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          httpGet:
            path: /api/health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 40
          periodSeconds: 40
          successThreshold: 1
          timeoutSeconds: 1
        name: agent
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: default-token-n28d2
          readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: ip-10-0-1-20.eu-west-1.compute.internal
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: default-token-n28d2
        secret:
          defaultMode: 420
          secretName: default-token-n28d2
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2020-12-17T22:17:46Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2020-12-17T22:17:49Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2020-12-17T22:17:49Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2020-12-17T22:17:46Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: <docker://cf748aa6b79bcc1d1aaa0b39eda0a0c07342a6d1a39e51637d11c6f89fbdb6b2>
        image: prefecthq/prefect:0.14.0-python3.6
        imageID: <docker-pullable://prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
        lastState: {}
        name: agent
        ready: true
        restartCount: 0
        started: true
        state:
          running:
            startedAt: "2020-12-17T22:17:48Z"
      hostIP: 10.0.1.20
      phase: Running
      podIP: 10.0.1.54
      podIPs:
      - ip: 10.0.1.54
      qosClass: Guaranteed
      startTime: "2020-12-17T22:17:46Z"
    And this is from the job spawned by the agent. Unlike yours, it is not adding the imagePullSecrets section
    apiVersion: batch/v1
    kind: Job
    metadata:
      creationTimestamp: "2020-12-17T22:19:52Z"
      labels:
        <http://prefect.io/flow_id|prefect.io/flow_id>: 0320c90d-56c0-40b1-a259-75ef587d24e3
        <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
        <http://prefect.io/identifier|prefect.io/identifier>: ba7cd008
      name: prefect-job-ba7cd008
      namespace: default
      resourceVersion: "2377667"
      selfLink: /apis/batch/v1/namespaces/default/jobs/prefect-job-ba7cd008
      uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
    spec:
      backoffLimit: 6
      completions: 1
      parallelism: 1
      selector:
        matchLabels:
          controller-uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
      template:
        metadata:
          creationTimestamp: null
          labels:
            controller-uid: 9aa7782e-3bfa-4282-a021-25732e1a862a
            job-name: prefect-job-ba7cd008
            <http://prefect.io/flow_id|prefect.io/flow_id>: 0320c90d-56c0-40b1-a259-75ef587d24e3
            <http://prefect.io/flow_run_id|prefect.io/flow_run_id>: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
            <http://prefect.io/identifier|prefect.io/identifier>: ba7cd008
        spec:
          containers:
          - args:
            - prefect
            - execute
            - flow-run
            env:
            - name: PREFECT__CLOUD__API
              value: <http://prefect-server-apollo.default.svc.cluster.local:4200>
            - name: PREFECT__CLOUD__AUTH_TOKEN
            - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
              value: "false"
            - name: PREFECT__CONTEXT__FLOW_RUN_ID
              value: 34028fb2-2cd2-4cf4-88e1-82d983c650b2
            - name: PREFECT__CONTEXT__FLOW_ID
              value: 0320c90d-56c0-40b1-a259-75ef587d24e3
            - name: PREFECT__CONTEXT__IMAGE
              value: drtools/prefect:aircraft-etl
            - name: PREFECT__LOGGING__LOG_TO_CLOUD
              value: "true"
            - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudFlowRunner
            - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
              value: prefect.engine.cloud.CloudTaskRunner
            image: drtools/prefect:aircraft-etl
            imagePullPolicy: IfNotPresent
            name: flow
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    status:
      active: 1
      startTime: "2020-12-17T22:19:52Z"
    and this is the log of the agent
    [2020-12-17 22:19:52,076] INFO - agent | Found 1 flow run(s) to submit for execution.
    [2020-12-17 22:19:52,079] DEBUG - agent | Updating states for flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2
    [2020-12-17 22:19:52,096] DEBUG - agent | Flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2 is in a Scheduled state, updating to Submitted
    [2020-12-17 22:19:52,110] DEBUG - agent | Next query for flow runs in 0.25 seconds
    [2020-12-17 22:19:52,236] INFO - agent | Deploying flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2
    [2020-12-17 22:19:52,238] DEBUG - agent | Loading job template from '/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/job_template.yaml'
    [2020-12-17 22:19:52,298] DEBUG - agent | Creating namespaced job prefect-job-ba7cd008
    [2020-12-17 22:19:52,317] DEBUG - agent | Job prefect-job-ba7cd008 created
    [2020-12-17 22:19:52,360] DEBUG - agent | Querying for flow runs
    [2020-12-17 22:19:52,476] DEBUG - agent | Completed flow run submission (id: 34028fb2-2cd2-4cf4-88e1-82d983c650b2)
    [2020-12-17 22:19:52,508] DEBUG - agent | No flow runs found
    [2020-12-17 22:19:52,510] DEBUG - agent | Next query for flow runs in 0.5 seconds
    [2020-12-17 22:19:53,010] DEBUG - agent | Querying for flow runs
    [2020-12-17 22:19:53,067] DEBUG - agent | No flow runs found
    [2020-12-17 22:19:53,072] DEBUG - agent | Next query for flow runs in 1.0 seconds
    [2020-12-17 22:19:54,072] DEBUG - agent | Querying for flow runs
    [2020-12-17 22:19:54,105] DEBUG - agent | No flow runs found
    [2020-12-17 22:19:54,106] DEBUG - agent | Next query for flow runs in 2.0 seconds
    [2020-12-17 22:19:56,106] DEBUG - agent | Querying for flow runs
    [2020-12-17 22:19:56,148] DEBUG - agent | No flow runs found
    [2020-12-17 22:19:56,148] DEBUG - agent | Next query for flow runs in 4.0 seconds
    [2020-12-17 22:19:59,582] DEBUG - agent | Running agent heartbeat...
    [2020-12-17 22:19:59,582] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
    [2020-12-17 22:19:59,590] DEBUG - agent | Deleting job prefect-job-37fb2fd1
    [2020-12-17 22:19:59,616] DEBUG - agent | Failing flow run 34028fb2-2cd2-4cf4-88e1-82d983c650b2 due to pod ErrImagePull
    [2020-12-17 22:19:59,675] ERROR - agent | Error while managing existing k8s jobs
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 357, in heartbeat
        self.manage_jobs()
      File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 215, in manage_jobs
        pod_events.items, key=lambda x: x.last_timestamp
    TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'
    [2020-12-17 22:19:59,714] DEBUG - agent | Sleeping heartbeat for 60.0 seconds
    [2020-12-17 22:20:00,149] DEBUG - agent | Querying for flow runs
    [2020-12-17 22:20:00,197] DEBUG - agent | No flow runs found
    [2020-12-17 22:20:00,198] DEBUG - agent | Next query for flow runs in 8.0 seconds
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Hmmm, ok. Thanks for all the info! This is helpful (looks like you've also found another unrelated bug with that error log ๐Ÿ˜ƒ)
    One last question - can you post
    flow.diagnostics()
    for your flow?
    (diagnostics is a method on the flow)
    Pedro Martins

    Pedro Martins

    1 year ago
    (looks like you've also found another unrelated bug with that error log ๐Ÿ˜ƒ)
    yeah! that error might be because it cannot connect to the pod it tried to create ๐Ÿคทโ€โ™‚๏ธ
    Ok! One moment
    {
      "config_overrides": {},
      "env_vars": [],
      "flow_information": {
        "environment": false,
        "result": false,
        "run_config": {
          "cpu_limit": false,
          "cpu_request": false,
          "env": false,
          "image": true,
          "image_pull_secrets": true,
          "job_template": false,
          "job_template_path": false,
          "labels": false,
          "memory_limit": false,
          "memory_request": false,
          "service_account_name": false,
          "type": "KubernetesRun"
        },
        "schedule": false,
        "storage": {
          "_flows": false,
          "_labels": false,
          "add_default_labels": true,
          "bucket": true,
          "client_options": false,
          "flows": false,
          "key": false,
          "local_script_path": false,
          "result": true,
          "secrets": false,
          "stored_as_script": false,
          "type": "S3"
        },
        "task_count": 7
      },
      "system_information": {
        "platform": "Linux-4.14.203-156.332.amzn2.x86_64-x86_64-with-glibc2.10",
        "prefect_backend": "server",
        "prefect_version": "0.14.0",
        "python_version": "3.8.6"
      }
    }
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    cool, thanks! I'll try to take a look at this tomorrow - this looks like a bug. Thanks for working through this with me.
    Nuno Silva

    Nuno Silva

    1 year ago
    @Jim Crist-Harif and @Pedro Martins I have exactly the same two bugs described here: the
    TypeError
    and no imagePullSecrets. I'm reinstalling prefect using conda as well, trying to see if there's any dependency not automatically update when installing prefect that is causing this discrepancy. Thanks for all the debugging
    reinstalled and triple checked prefect and k8s agent versions, all in 0.14.0, it doesn't create imagePullSecrets in the job image and I still get the pull image error. Curious what you find. Thanks
    Pedro Martins

    Pedro Martins

    1 year ago
    @Jim Crist-Harif @Dylan I dig deep into the prefect code to understand what is going on with the
    imagePullSecrets
    tag. The Kubernetes Agent
    deploy_flow
    actually creates the job specification with the secret:
    {'apiVersion': 'batch/v1',
     'kind': 'Job',
     'spec': {'template': {'spec': {'containers': [{'name': 'flow',
          'image': 'drtools/prefect:aircraft-etl',
          'args': ['prefect', 'execute', 'cloud-flow'],
          'env': [{'name': 'PREFECT__CLOUD__API',
            'value': 'http://****:4200'},
           {'name': 'PREFECT__CLOUD__AUTH_TOKEN', 'value': ''},
           {'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'},
           {'name': 'PREFECT__CONTEXT__FLOW_RUN_ID',
            'value': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
           {'name': 'PREFECT__CONTEXT__FLOW_ID',
            'value': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
           {'name': 'PREFECT__CONTEXT__IMAGE',
            'value': 'drtools/prefect:aircraft-etl'},
           {'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'},
           {'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS',
            'value': 'prefect.engine.cloud.CloudFlowRunner'},
           {'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS',
            'value': 'prefect.engine.cloud.CloudTaskRunner'}],
          'resources': {'requests': {}, 'limits': {}}}],
        'restartPolicy': 'Never'},
       'metadata': {'labels': {'<http://prefect.io/identifier|prefect.io/identifier>': 'fb944cb5',
         '<http://prefect.io/flow_run_id|prefect.io/flow_run_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60',
         '<http://prefect.io/flow_id|prefect.io/flow_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'}},
       'imagePullSecrets': [{'name': 'regcred'}]}},
     'metadata': {'labels': {'<http://prefect.io/identifier|prefect.io/identifier>': 'fb944cb5',
       '<http://prefect.io/flow_run_id|prefect.io/flow_run_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60',
       '<http://prefect.io/flow_id|prefect.io/flow_id>': 'a7b69781-cee0-4b40-811c-1aeb47a8cf60'},
      'name': 'prefect-job-fb944cb5'}}
    Then it calls
    self.batch_client.create_namespaced_job
    . There are some sanitization in the payload but they don't remove the pull secret from the body. When it calls the kubernetes api
    self.api_client.call_api
    the body is complete! However the job specification that reaches the cluster doesn't contain the secret. It gets lost in the way or it is removed in the cluster api server. Are you aware of some API incompatibility here?
    @Jim Crist-Harifย @Dylan I actually found the problem guys! The kubernetes agent
    generate_job_spec_from_run_config
    is adding the secret in the wrong level. The secret should be added to the same level of container specification. The fix should be this:
    pod_template["spec"]["imagePullSecrets"] = [{"name": s} for s in image_pull_secrets]
    https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/kubernetes/agent.py#L623
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Ah, nice catch. I'll push a fix up today with this change. Thanks!
    Done! Thanks for finding the issue, this will be out in the next release (either tomorrow or wed) https://github.com/PrefectHQ/prefect/pull/3884
    Ananthapadmanabhan P

    Ananthapadmanabhan P

    1 year ago
    Hey! I think iโ€™m facing the same issue (?). Getting the following error
    Event: 'Failed' on pod 'prefect-job-42420d16-bn54h'
    	Message: Error: ErrImagePull
    I have a prefect server running on kubernetes, which I installed using the helm chart available here - https://github.com/PrefectHQ/server/tree/master/helm/prefect-server. Tried two things - pass the arg
    image_pull_secrets
    to
    KubernetesRun()
    and tried editing the k8s deployment of agent to have the correct secret
    IMAGE_PULL_SECRETS: [vi-dockerhub-key]
    Neither worked for me and I could see that pod does not have the pull secrets in its description. Also the above secret is in
    default
    namespace. Since, the above issue seem to be fixed, am I missing something trivial/obvious?