Marwan Sarieddine
05/22/2020, 4:36 PM@task
def wait_for_resources():
    client = get_client()
    # Wait until we have 10 workers
    client.wait_for_workers(n_workers=10)Jenny
05/22/2020, 4:59 PMMarwan Sarieddine
05/22/2020, 5:00 PMJenny
05/22/2020, 5:11 PMJenny
05/22/2020, 5:12 PMMarwan Sarieddine
05/22/2020, 5:12 PMjosh
05/22/2020, 5:24 PMMarwan Sarieddine
05/22/2020, 5:40 PMDaskKubernetesEnvironmentjosh
05/22/2020, 5:43 PMJim Crist-Harif
05/22/2020, 5:50 PMMarwan Sarieddine
05/22/2020, 5:58 PMMarwan Sarieddine
05/22/2020, 5:58 PM4m20s       Normal    Scheduled                                                                                                    pod/prefect-job-815860a0-gzhl6                                    Successfully assigned default/prefect-job-815860a0-gzhl6 to ip-192-168-35-156.us-west-2.compute.internal
4m20s       Normal    SuccessfulCreate                                                                                             job/prefect-job-815860a0                                          Created pod: prefect-job-815860a0-gzhl6
4m20s       Normal    Pulling                                                                                                      pod/prefect-job-815860a0-gzhl6                                    Pulling image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>"
4m17s       Normal    Pulled                                                                                                       pod/prefect-job-815860a0-gzhl6                                    Successfully pulled image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>"
4m17s       Normal    Created                                                                                                      pod/prefect-job-815860a0-gzhl6                                    Created container flow
4m16s       Normal    SuccessfulCreate                                                                                             job/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de         Created pod: prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587
4m16s       Normal    Started                                                                                                      pod/prefect-job-815860a0-gzhl6                                    Started container flow
4m16s       Normal    Scheduled                                                                                                    pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587   Successfully assigned default/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587 to ip-192-168-17-18.us-west-2.compute.internal
4m15s       Normal    Pulled                                                                                                       pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587   Container image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>" already present on machine
4m15s       Normal    Created                                                                                                      pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587   Created container flow
4m14s       Normal    Started                                                                                                      pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-vs587   Started container flow
4m4s        Normal    Scheduled                                                                                                    pod/dask-root-7c45f7c9-art6xn                                     Successfully assigned default/dask-root-7c45f7c9-art6xn to ip-192-168-35-156.us-west-2.compute.internal
4m3s        Normal    Created                                                                                                      pod/dask-root-7c45f7c9-art6xn                                     Created container dask-worker
4m3s        Normal    Pulled                                                                                                       pod/dask-root-7c45f7c9-art6xn                                     Container image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>" already present on machine
4m3s        Normal    Started                                                                                                      pod/dask-root-7c45f7c9-art6xn                                     Started container dask-worker
4m          Normal    Killing                                                                                                      pod/dask-root-7c45f7c9-art6xn                                     Stopping container dask-worker
3m53s       Normal    Scheduled                                                                                                    pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9   Successfully assigned default/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9 to ip-192-168-35-156.us-west-2.compute.internal
3m53s       Normal    SuccessfulCreate                                                                                             job/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de         Created pod: prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9
3m52s       Normal    Pulled                                                                                                       pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9   Container image "<http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>" already present on machine
3m52s       Normal    Created                                                                                                      pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9   Created container flow
3m51s       Normal    Started                                                                                                      pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9   Started container flow
3m3s        Warning   FailedScheduling                                                                                             pod/dask-root-0f923f19-62vfb8                                     0/2 nodes are available: 2 node(s) didn't match node selector.
3m33s       Normal    TriggeredScaleUp                                                                                             pod/dask-root-0f923f19-62vfb8                                     pod triggered scale-up: [{eksctl-prefect-eks-test-nodegroup-eks-cpu-2-NodeGroup-1PDZRBRX9TE4I 0->1 (max: 10)}]
3m33s       Normal    Killing                                                                                                      pod/prefect-dask-job-fb5921cb-d719-4962-b1ea-f0fe5539c8de-f4wc9   Stopping container flow
2m52s       Normal    NodeHasNoDiskPressure                                                                                        node/ip-192-168-81-183.us-west-2.compute.internal                 Node ip-192-168-81-183.us-west-2.compute.internal status is now: NodeHasNoDiskPressure
2m52s       Normal    NodeHasSufficientMemory                                                                                      node/ip-192-168-81-183.us-west-2.compute.internal                 Node ip-192-168-81-183.us-west-2.compute.internal status is now: NodeHasSufficientMemory
2m52s       Normal    NodeAllocatableEnforced                                                                                      node/ip-192-168-81-183.us-west-2.compute.internal                 Updated Node Allocatable limit across pods
2m52s       Normal    NodeHasSufficientPID                                                                                         node/ip-192-168-81-183.us-west-2.compute.internal                 Node ip-192-168-81-183.us-west-2.compute.internal status is now: NodeHasSufficientPID
2m49s       Warning   FailedScheduling                                                                                             pod/dask-root-0f923f19-62vfb8                                     0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector.
2m53s       Normal    Starting                                                                                                     node/ip-192-168-81-183.us-west-2.compute.internal                 Starting kubelet.
2m51s       Normal    RegisteredNode                                                                                               node/ip-192-168-81-183.us-west-2.compute.internal                 Node ip-192-168-81-183.us-west-2.compute.internal event: Registered Node ip-192-168-81-183.us-west-2.compute.internal in Controller
2m48s       Normal    Starting                                                                                                     node/ip-192-168-81-183.us-west-2.compute.internal                 Starting kube-proxy.
2m33s       Warning   FailedScheduling                                                                                             pod/dask-root-0f923f19-62vfb8                                     skip schedule deleting pod: default/dask-root-0f923f19-62vfb8
2m32s       Normal    NodeReady                                                                                                    node/ip-192-168-81-183.us-west-2.compute.internal                 Node ip-192-168-81-183.us-west-2.compute.internal statusMarwan Sarieddine
05/22/2020, 5:58 PMMarwan Sarieddine
05/22/2020, 6:07 PMMarwan Sarieddine
05/22/2020, 6:09 PM2m49s       Warning   FailedScheduling                                                                                             pod/dask-root-0f923f19-62vfb8                                     0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector.Jim Crist-Harif
05/22/2020, 6:09 PMMarwan Sarieddine
05/22/2020, 6:09 PMJim Crist-Harif
05/22/2020, 6:09 PMMarwan Sarieddine
05/22/2020, 6:11 PMkind: Pod
metadata:
  labels:
    app: prefect-dask-worker
spec:
  replicas: 2
  restartPolicy: Never
  imagePullSecrets:
  - name: gitlab-secret
  # note I tried using both affinity and a selector
  # nodeSelector:
  #   role: supplement
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: role
            operator: In
            values:
            - supplement
  containers:
    - image: <http://registry.gitlab.com/xxxx|registry.gitlab.com/xxxx>
      imagePullPolicy: IfNotPresent
      args: [dask-worker, --nthreads, "1", --no-bokeh, --memory-limit, 4GB]
      name: dask-worker
      env:
        - name: AWS_BUCKET
          value: xxxx
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_SECRET_ACCESS_KEY
      resources:
        limits:
          cpu: "2000m"
          memory: 4G
        requests:
          cpu: "1000m"
          memory: 2GJim Crist-Harif
05/22/2020, 6:13 PMwait_for_resources()Marwan Sarieddine
05/22/2020, 6:13 PMMarwan Sarieddine
05/22/2020, 6:14 PMJim Crist-Harif
05/22/2020, 6:16 PMwait_for_resourcesMarwan Sarieddine
05/22/2020, 6:16 PM22 May 2020,01:54:21 	prefect.CloudFlowRunner	INFO	Beginning Flow run for 'Data Processing'
22 May 2020,01:54:21 	prefect.CloudFlowRunner	INFO	Starting flow run.
22 May 2020,01:54:21 	prefect.CloudFlowRunner	DEBUG	Flow 'Data Processing': Handling state change from Scheduled to RunningMarwan Sarieddine
05/22/2020, 6:16 PMMarwan Sarieddine
05/22/2020, 6:17 PMJim Crist-Harif
05/22/2020, 6:21 PMJim Crist-Harif
05/22/2020, 6:21 PMJim Crist-Harif
05/22/2020, 6:21 PMMarwan Sarieddine
05/22/2020, 6:22 PMJim Crist-Harif
05/22/2020, 6:22 PMMarwan Sarieddine
05/22/2020, 6:22 PMDaskKubernetesEnvironmentFlow(
        "Data Processing",
        environment=DaskKubernetesEnvironment(
            worker_spec_file="worker_spec.yaml",
            min_workers=1,
            max_workers=10,
        ),
        storage=Docker(
            registry_url=os.environ['GITLAB_REGISTRY'],
            image_name="dask-k8s-flow",
            image_tag="0.1.0",
            python_dependencies=[
                'boto3==1.13.14',
                'numpy==1.18.4'
            ]
        ),
        result=s3_result,
    )Jim Crist-Harif
05/22/2020, 6:23 PMMarwan Sarieddine
05/22/2020, 6:23 PMDo you set scheduler_service_wait_timeout?As you can see - No - I don’t explicitly set it
Jim Crist-Harif
05/22/2020, 6:25 PMJim Crist-Harif
05/22/2020, 6:25 PMJim Crist-Harif
05/22/2020, 6:26 PMMarwan Sarieddine
05/22/2020, 6:27 PMJim Crist-Harif
05/22/2020, 6:28 PMMarwan Sarieddine
05/22/2020, 6:29 PMJim Crist-Harif
05/22/2020, 6:30 PMMarwan Sarieddine
05/22/2020, 6:31 PMkubectl get eventsMarwan Sarieddine
05/22/2020, 6:32 PMJim Crist-Harif
05/22/2020, 6:33 PMkubectl logs that-pod-nameMarwan Sarieddine
05/22/2020, 6:35 PMMarwan Sarieddine
05/22/2020, 6:37 PMJim Crist-Harif
05/22/2020, 6:38 PMdeploy-mode="local"Marwan Sarieddine
05/22/2020, 6:39 PMJim Crist-Harif
05/22/2020, 6:40 PMJim Crist-Harif
05/22/2020, 6:41 PMMarwan Sarieddine
05/22/2020, 6:41 PMJim Crist-Harif
05/22/2020, 6:42 PMMarwan Sarieddine
05/22/2020, 6:43 PMJim Crist-Harif
05/22/2020, 6:44 PMMarwan Sarieddine
05/22/2020, 6:44 PMJim Crist-Harif
05/22/2020, 6:45 PMMarwan Sarieddine
05/22/2020, 6:45 PMMarwan Sarieddine
05/22/2020, 6:45 PMMarwan Sarieddine
05/22/2020, 6:48 PMJim Crist-Harif
05/22/2020, 6:50 PMjosh
05/22/2020, 6:50 PMJim Crist-Harif
05/22/2020, 6:50 PMJim Crist-Harif
05/22/2020, 6:50 PMjosh
05/22/2020, 6:50 PMMarwan Sarieddine
05/22/2020, 6:52 PM