Marwan Sarieddine
05/20/2020, 9:37 PMDaskKubernetesEnvironment
and setting imagePullSecrets to pull images from our private gitlab container registry …
Note I am using a Kubernetes Agent polling from prefect cloud and using prefect version 0.11.2.
A couple of questions:
1. I thought if I created the docker-registry secret using kubernetes, added imagePullSecrets to the podSpec and specified a custom scheduler_file and worker_spec file to DaskKubernetesEnvironment - this approach doesn’t rely on prefect secrets and I thought should work fine - but I am getting an empty dict for imagePullSecrets when inspecting the job and pod specs … I thought the kubernetes agent might be overwriting them (I saw the replace_job_spec_yaml
method ) and so I also specified the secret name in the kubernetes agent manifest under the environment variable (IMAGE_PULL_SECRETS) — but still I am getting an empty dict for imagePullSecrets - any idea why ?
2. The other approach seems to be not to specify the worker_spec and scheduler file but to set private_registry to True and docker_secret to the name of the Prefect Secret, then I use the client to create the prefect secret by setting a name and value - I don’t see where in the code is the prefect secret’s value taken and a kubernetes secret is created ? (so I tried both using a dictionary of docker-server, docker-username, docker-password, and docker-email or just setting the value to the name of the k8s secret that I created) both approaches didnt work - and I am getting an empty dict for imagePullSecrets …
Any idea what might be going on here ? and what is the best practice to setting k8s imagePullSecrets for DaskKubernetesEnvironment
?josh
05/20/2020, 9:45 PMprefect-job
that is created and isn’t propagated down to the dask scheduler/workers.
Let’s try to troubleshoot, first question: Are you seeing the initial prefect-job
being created and then the following dask scheduler job/pod?Marwan Sarieddine
05/20/2020, 9:56 PMprefect-job
being created … to be specific here is how my k8s resources look:
$ kubectl get pods,jobs
NAME READY STATUS RESTARTS AGE
pod/prefect-agent-5f6458886d-z4btq 2/2 Running 0 35m
pod/prefect-job-995c4982-5jzzf 0/1 ErrImagePull 0 6s
NAME COMPLETIONS DURATION AGE
job.batch/prefect-job-995c4982 0/1 6s 6s
josh
05/20/2020, 9:58 PMIMAGE_PULL_SECRETS
and keep the imagePullSecret in the yaml of your environment’s scheduler and worker does it start working?Marwan Sarieddine
05/20/2020, 10:02 PM$ kubectl get pod prefect-job-995c4982-5jzzf -o yaml | yq r - "spec.imagePullSecrets"
- {}
josh
05/20/2020, 10:02 PMIMAGE_PULL_SECRETS
set?Marwan Sarieddine
05/20/2020, 10:03 PMcontainers:
- args:
- prefect agent start kubernetes
command:
- /bin/bash
- -c
env:
- name: PREFECT__CLOUD__API
value: <https://api.prefect.io>
- name: NAMESPACE
value: default
- name: IMAGE_PULL_SECRETS
value: gitlab-secret
- name: PREFECT__CLOUD__AGENT__LABELS
value: '[]'
- name: JOB_MEM_REQUEST
value: 256Mi
- name: JOB_MEM_LIMIT
value: 512Mi
- name: JOB_CPU_REQUEST
value: 500m
- name: JOB_CPU_LIMIT
value: 1000m
- name: PREFECT__BACKEND
value: cloud
- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
value: http://:8080
image: prefecthq/prefect:0.11.2-python3.6
imagePullPolicy: Always
livenessProbe:
failureThreshold: 2
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 40
periodSeconds: 40
name: agent
resources:
limits:
cpu: 100m
memory: 128Mi
josh
05/20/2020, 10:05 PMkubectl describe
on the prefect job? Also removing any auth info of courseMarwan Sarieddine
05/20/2020, 10:07 PM$ kubectl describe pod prefect-job-995c4982-5jzzf
Name: prefect-job-995c4982-5jzzf
Namespace: default
Priority: 0
Node: ip-192-168-79-244.us-west-2.compute.internal/192.168.79.244
Start Time: Wed, 20 May 2020 17:55:39 -0400
Labels: app=prefect-job-995c4982
controller-uid=5af44c8e-3301-44d3-9890-ffae403ad426
flow_run_id=868e1b00-8fd5-4394-a3d2-5a2412e2a373
identifier=995c4982
job-name=prefect-job-995c4982
Annotations: <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status: Pending
IP: 192.168.80.39
IPs: <none>
Controlled By: Job/prefect-job-995c4982
Containers:
flow:
Container ID:
Image: <http://registry.gitlab.com/ifm-data-science/kubeflow-pipelines/mlops-examples/dask-k8s-flow:0.1.0|registry.gitlab.com/ifm-data-science/kubeflow-pipelines/mlops-examples/dask-k8s-flow:0.1.0>
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
prefect execute cloud-flow
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 500m
memory: 256Mi
Environment:
PREFECT__CLOUD__API: <https://api.prefect.io>
PREFECT__CONTEXT__FLOW_RUN_ID: 868e1b00-8fd5-4394-a3d2-5a2412e2a373
PREFECT__CONTEXT__FLOW_ID: 156ce5ef-53c3-4f61-9dcc-004cc890e141
PREFECT__CONTEXT__NAMESPACE: default
PREFECT__CLOUD__AGENT__LABELS: []
PREFECT__LOGGING__LOG_TO_CLOUD: true
PREFECT__CLOUD__USE_LOCAL_SECRETS: false
PREFECT__LOGGING__LEVEL: DEBUG
PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS: prefect.engine.cloud.CloudFlowRunner
PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS: prefect.engine.cloud.CloudTaskRunner
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s27ks (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-s27ks:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s27ks
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned default/prefect-job-995c4982-5jzzf to ip-192-168-79-244.us-west-2.compute.internal
Normal Pulling 9m25s (x4 over 10m) kubelet, ip-192-168-79-244.us-west-2.compute.internal Pulling image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>"
Warning Failed 9m24s (x4 over 10m) kubelet, ip-192-168-79-244.us-west-2.compute.internal Failed to pull image "<http://registry.gitlab.com/xxx|registry.gitlab.com/xxx>: denied: access forbidden
Warning Failed 9m24s (x4 over 10m) kubelet, ip-192-168-79-244.us-west-2.compute.internal Error: ErrImagePull
Warning Failed 8m57s (x7 over 10m) kubelet, ip-192-168-79-244.us-west-2.compute.internal Error: ImagePullBackOff
Normal BackOff 54s (x40 over 10m) kubelet, ip-192-168-79-244.us-west-2.compute.internal Back-off pulling image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>"
josh
05/20/2020, 10:10 PMkubectl describe job prefect-job-995c4982
Marwan Sarieddine
05/20/2020, 10:11 PM$ kubectl describe job prefect-job-995c4982
Name: prefect-job-995c4982
Namespace: default
Selector: controller-uid=5af44c8e-3301-44d3-9890-ffae403ad426
Labels: app=prefect-job-995c4982
flow_id=156ce5ef-53c3-4f61-9dcc-004cc890e141
flow_run_id=868e1b00-8fd5-4394-a3d2-5a2412e2a373
identifier=995c4982
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Wed, 20 May 2020 17:55:39 -0400
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=prefect-job-995c4982
controller-uid=5af44c8e-3301-44d3-9890-ffae403ad426
flow_run_id=868e1b00-8fd5-4394-a3d2-5a2412e2a373
identifier=995c4982
job-name=prefect-job-995c4982
Containers:
flow:
Image: <http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
prefect execute cloud-flow
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 500m
memory: 256Mi
Environment:
PREFECT__CLOUD__API: <https://api.prefect.io>
PREFECT__CONTEXT__FLOW_RUN_ID: 868e1b00-8fd5-4394-a3d2-5a2412e2a373
PREFECT__CONTEXT__FLOW_ID: 156ce5ef-53c3-4f61-9dcc-004cc890e141
PREFECT__CONTEXT__NAMESPACE: default
PREFECT__CLOUD__AGENT__LABELS: []
PREFECT__LOGGING__LOG_TO_CLOUD: true
PREFECT__CLOUD__USE_LOCAL_SECRETS: false
PREFECT__LOGGING__LEVEL: DEBUG
PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS: prefect.engine.cloud.CloudFlowRunner
PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS: prefect.engine.cloud.CloudTaskRunner
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 15m job-controller Created pod: prefect-job-995c4982-5jzzf
josh
05/20/2020, 10:13 PMMarwan Sarieddine
05/20/2020, 10:13 PM$ kubectl describe deployments prefect-agent
Name: prefect-agent
Namespace: default
CreationTimestamp: Wed, 20 May 2020 13:13:47 -0400
Labels: app=prefect-agent
Annotations: <http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: 6
Selector: app=prefect-agent
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=prefect-agent
Containers:
agent:
Image: prefecthq/prefect:0.11.2-python3.6
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
prefect agent start kubernetes
Limits:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
Environment:
PREFECT__CLOUD__API: <https://api.prefect.io>
NAMESPACE: default
IMAGE_PULL_SECRETS: gitlab-secret
PREFECT__CLOUD__AGENT__LABELS: []
JOB_MEM_REQUEST: 256Mi
JOB_MEM_LIMIT: 512Mi
JOB_CPU_REQUEST: 500m
JOB_CPU_LIMIT: 1000m
PREFECT__BACKEND: cloud
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
Mounts: <none>
resource-manager:
Image: prefecthq/prefect:0.11.2-python3.6
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
python -c 'from prefect.agent.kubernetes import ResourceManager; ResourceManager().start()'
Limits:
cpu: 100m
memory: 128Mi
Environment:
PREFECT__CLOUD__API: <https://api.prefect.io>
PREFECT__CLOUD__AGENT__RESOURCE_MANAGER__LOOP_INTERVAL: 60
NAMESPACE: default
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: prefect-agent-67b846cc76 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 53m deployment-controller Scaled up replica set prefect-agent-5f6458886d to 1
Normal ScalingReplicaSet 53m deployment-controller Scaled down replica set prefect-agent-587c5d8cdd to 0
Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set prefect-agent-d6668bb6d to 1
Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set prefect-agent-5f6458886d to 0
Normal ScalingReplicaSet 7m57s deployment-controller Scaled up replica set prefect-agent-67b846cc76 to 1
Normal ScalingReplicaSet 7m52s deployment-controller Scaled down replica set prefect-agent-d6668bb6d to 0
josh
05/20/2020, 10:17 PMgitlab-secret
exists correct? 😄Marwan Sarieddine
05/20/2020, 10:17 PM$ kubectl get secret -o wide
NAME TYPE DATA AGE
default-token-s27ks <http://kubernetes.io/service-account-token|kubernetes.io/service-account-token> 3 8h
gitlab-secret <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson> 1 5h7m
josh
05/20/2020, 10:22 PM# Use image pull secrets if provided
job["spec"]["template"]["spec"]["imagePullSecrets"][0]["name"] = os.getenv("IMAGE_PULL_SECRETS", "")
Which it looks like your agent has that env var set.
My last test would be to check if you can create a pod that uses your image pull secret with something like:
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: <your-private-image>
imagePullSecrets:
- name: your-secret
If that pull works then there is a bug in the agent code!Marwan Sarieddine
05/20/2020, 10:24 PM- name: IMAGE_PULL_SECRETS
value: "gitlab-secret"
instead of
- name: IMAGE_PULL_SECRETS
value: gitlab-secret
just to be sure it’s not because of quotes missingjosh
05/20/2020, 10:25 PMMarwan Sarieddine
05/20/2020, 10:27 PMdeployment.apps/prefect-agent unchanged
when I add the quotesjosh
05/20/2020, 10:45 PMMarwan Sarieddine
05/20/2020, 10:45 PMjosh
05/20/2020, 10:46 PM