Billy McMonagle

    Billy McMonagle

    1 year ago
    I have a flow with docker storage, running with the k8s agent. When I run my flow and look at the logs, it seems to be executing an old version of my flow. Details in thread...
    From my agent:
    Environment variables
    
        PREFECT__CLOUD__AGENT__AUTH_TOKEN={{undefined}}
        PREFECT__CLOUD__API=<https://api.prefect.io>
        NAMESPACE=default
        IMAGE_PULL_SECRETS=undefined
        PREFECT__CLOUD__AGENT__LABELS=['production']
        JOB_MEM_REQUEST=undefined
        JOB_MEM_LIMIT=undefined
        JOB_CPU_REQUEST=undefined
        JOB_CPU_LIMIT=undefined
        IMAGE_PULL_POLICY=Always
        SERVICE_ACCOUNT_NAME=undefined
        PREFECT__BACKEND=cloud
        PREFECT__CLOUD__AGENT__AGENT_ADDRESS=http://:8080
    My flow:
    @task
    def execute_query(query):
        print("inside execute_query task to run query ", query)
    
    storage = Docker(
        registry_url=REGISTRY_URL,
        dockerfile=DOCKERFILE,
        image_name=IMAGE_NAME,
        image_tag=IMAGE_TAG,
    )
    
    with Flow("all_accounts", storage=storage) as all_accounts_flow:
        execute_query("all_accounts")
    
    storage.build()
    
    if __name__ == "__main__":
        all_accounts_flow.register(project_name=PROJECT_NAME)
    I'm almost totally certain I'm doing something silly, so thanks in advance!
    j

    josh

    1 year ago
    Hmm I can’t tell what’s happening judging by the snippets you have provided. Is there a chance that when you call
    .register
    is isn’t updating the image_name:tag in your image repository?
    Michael Adkins

    Michael Adkins

    1 year ago
    Is this limited to the K8s agent?
    j

    josh

    1 year ago
    ^ Also valid! There might be a chance that k8s isn’t pulling the new image if it has the same name:tag
    Michael Adkins

    Michael Adkins

    1 year ago
    There is the
    IMAGE_PULL_POLICY=Always
    env var but I’d have to consult to the code to see how that’s templated in the K8s jobs
    if os.getenv("IMAGE_PULL_POLICY"):
                job["spec"]["template"]["spec"]["containers"][0][
                    "imagePullPolicy"
                ] = os.getenv("IMAGE_PULL_POLICY")
    hmm
    Can you inspect the image pull policy on the flow pod with kubectl?
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Hey @michael thanks for the ideas, sorry I stepped away...
    i think you guys are on the right track, because I've got an image:label that is not updating.
    I'm trying to understand the best/recommended way to setup my image name and tag. Currently, this is set to something like
    prefect_repo:appname
    but it sounds like I should do
    appname:latest
    or something else.
    I did check the image pull policy on the job, and it is set to
    IfNotPresent
    ...
    Michael Adkins

    Michael Adkins

    1 year ago
    Well, that’s the culprit then. I’m not certain why that environment variable isn’t being respected. Can you check the environment of the agent using kubectl?
    Are you using a
    RunConfig
    ?
    Billy McMonagle

    Billy McMonagle

    1 year ago
    I do not have a
    RunConfig
    set currently, although I do intend to use one to set some env vars, once I get over this hump.
    Agent inspection incoming...
    What do you want to know about the agent environment? Just env vars?
    Michael Adkins

    Michael Adkins

    1 year ago
    If it has a IMAGE_PULL_POLICY env var
    (and the value)
    Billy McMonagle

    Billy McMonagle

    1 year ago
    │     env: 
    │     - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN                                                                                                                                                                                                              │
    │       valueFrom: 
    │         secretKeyRef: 
    │           key: prefect_runner_token                                                                                                                                                                                                                        │
    │           name: prefect-orchestration-main                                                                                                                                                                                                                 │
    │     - name: PREFECT__CLOUD__API                                                                                                                                                                                                                            │
    │       value: <https://api.prefect.io>                                                                                                                                                                                                                        │
    │     - name: NAMESPACE                                                                                                                                                                                                                                      │
    │       value: default                                                                                                                                                                                                                                       │
    │     - name: IMAGE_PULL_SECRETS                                                                                                                                                                                                                             │
    │     - name: PREFECT__CLOUD__AGENT__LABELS                                                                                                                                                                                                                  │
    │       value: '[''production'']'                                                                                                                                                                                                                            │
    │     - name: JOB_MEM_REQUEST                                                                                                                                                                                                                                │
    │     - name: JOB_MEM_LIMIT                                                                                                                                                                                                                                  │
    │     - name: JOB_CPU_REQUEST                                                                                                                                                                                                                                │
    │     - name: JOB_CPU_LIMIT                                                                                                                                                                                                                                  │
    │     - name: IMAGE_PULL_POLICY                                                                                                                                                                                                                              │
    │       value: Always
    Michael Adkins

    Michael Adkins

    1 year ago
    Very interesting
    Billy McMonagle

    Billy McMonagle

    1 year ago
    perhaps I need to single quote
    Always
    Michael Adkins

    Michael Adkins

    1 year ago
    If you use a KubernetesRunConfig it ignores those configuring env vars, might be easiest to set the image pull policy in that
    Although I am curious what’s going on here
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Hm, I will try that, since I intend to use a RunConfig.
    Michael Adkins

    Michael Adkins

    1 year ago
    Can you confirm that the image pull policy is
    IfNotPresent
    in both the Job template and the flow run Pod?
    I’ve forwarded this to another member of the team who’s worked with this recently.
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Hm, I'm not sure how to distinguish the two... what I provided before was the yaml from the running
    Job
    (I'm using k9s, FWIW)
    Michael Adkins

    Michael Adkins

    1 year ago
    @Billy McMonagle sorry I was wrong, I don’t see a config option in the
    KubernetesRun
    for image pull policy.
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Gotcha, OK.
    If I end up needing to make a custom job template, that is going to be fine with me, but I bet we can get to the bottom of this
    Michael Adkins

    Michael Adkins

    1 year ago
    The agent creates a
    Job
    for a flow run. The
    Job
    creates a
    Pod
    from its template to actually run the flow. If the
    Pod
    fails, the
    Job
    will create a new one until it succeeds. (https://kubernetes.io/docs/concepts/workloads/controllers/job/)
    Billy McMonagle

    Billy McMonagle

    1 year ago
    that job line is the only thing I see, but I'm obviously new to k8s as well so I'm still poking around
    Michael Adkins

    Michael Adkins

    1 year ago
    Can you use
    kubectl
    ?
    kubectl get jobs
    will give you the actual
    Job
    instances. That looks like a
    Pod
    that is named after the job 🙂
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Yes, I can. Hard to get much bc the job runs for like 1s.
    k get jobs
    NAME                   COMPLETIONS   DURATION   AGE
    prefect-job-21ec7c17   0/1           1s         1s
    If you know the command to get the yaml for the job itself...
    Michael Adkins

    Michael Adkins

    1 year ago
    Then
    kubectl describe job prefect-job-21ec7c17
    Should give us the image pull policy
    If you put a sleep in your flow the job would run longer
    Billy McMonagle

    Billy McMonagle

    1 year ago
    perfect
    yes but I can't update my flow code bc of the image pull problem 🙃
    Michael Adkins

    Michael Adkins

    1 year ago
    (ahaha)
    Billy McMonagle

    Billy McMonagle

    1 year ago
    ❯ k describe jobs prefect-job-1d696b93
    Name:           prefect-job-1d696b93
    Namespace:      default
    Selector:       controller-uid=b139374d-d47a-4b0f-ab98-18cead0a8eb9
    Labels:         <http://prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5|prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5>
                    <http://prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b|prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b>
                    <http://prefect.io/identifier=1d696b93|prefect.io/identifier=1d696b93>
    Annotations:    <none>
    Parallelism:    1
    Completions:    1
    Start Time:     Tue, 19 Jan 2021 18:25:04 -0500
    Completed At:   Tue, 19 Jan 2021 18:25:12 -0500
    Duration:       8s
    Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
    Pod Template:
      Labels:  controller-uid=b139374d-d47a-4b0f-ab98-18cead0a8eb9
               job-name=prefect-job-1d696b93
               <http://prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5|prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5>
               <http://prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b|prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b>
               <http://prefect.io/identifier=1d696b93|prefect.io/identifier=1d696b93>
      Containers:
       flow:
        Image:      <http://XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe|XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe>
        Port:       <none>
        Host Port:  <none>
        Args:
          prefect
          execute
          flow-run
        Environment:
          PREFECT__CLOUD__API:                          <https://api.prefect.io>
          PREFECT__CLOUD__AUTH_TOKEN:                   XXX
          PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
          PREFECT__CONTEXT__FLOW_RUN_ID:                5e07a5e7-3e49-447d-aba7-47112061db8b
          PREFECT__CONTEXT__FLOW_ID:                    678d4e82-1e48-4d9c-9748-ade3d73867f5
          PREFECT__CONTEXT__IMAGE:                      <http://XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe|XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe>
          PREFECT__LOGGING__LOG_TO_CLOUD:               true
          PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
          PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
        Mounts:                                         <none>
      Volumes:                                          <none>
    Events:
      Type    Reason            Age   From            Message
      ----    ------            ----  ----            -------
      Normal  SuccessfulCreate  12s   job-controller  Created pod: prefect-job-1d696b93-gls7j
      Normal  Completed         4s    job-controller  Job complete
    I don't see a pull policy at all
    Michael Adkins

    Michael Adkins

    1 year ago
    Me neither -.- I’m going to pull up a cluster one sec
    kubectl get job <HERE> -o yaml | grep imagePull
    Or leave out the grep to take a peek at the whole thing. Guess it’s not included in
    grep
    Billy McMonagle

    Billy McMonagle

    1 year ago
    imagePullPolicy: IfNotPresent
    Michael Adkins

    Michael Adkins

    1 year ago
    Alrighty, so this is a pretty clear confirmation that the agent is not respecting that env variable — the code looks fine to me so we’ll have to reproduce this ourselves and debug
    Sorry about that
    For now, I’d recommend naming your images differently
    Billy McMonagle

    Billy McMonagle

    1 year ago
    that's OK! glad to figure it out
    I will keep poking and if I figure out why this is happening, I will flag for you. Thanks a ton.
    Michael Adkins

    Michael Adkins

    1 year ago
    Sweet thanks! Wonderful to work through it with ya.
    The default tag is the timestamp to avoid overwriting old images. It’s generated with
    slugify(pendulum.now("utc").isoformat())
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Let me ask a somewhat related question while I have you here......
    Is the typical practice to create one ECR repository per flow, assuming you do not store multiple flows in a single docker storage instance?
    Michael Adkins

    Michael Adkins

    1 year ago
    Generally, I’d recommend
    <flow>:<unique-id>
    if you want do keep them all in one repo
    prefect_flows:<flow>-<unique-id>
    makes sense to me.
    Yeah basically
    I’m not sure on ECR but that’s what we do in GCS internally
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Gotcha. There is no technical reason why we cannot do that, but the
    prefect_flows:<flow>-<unique-id>
    convention feels a little bit nicer to me.
    Michael Adkins

    Michael Adkins

    1 year ago
    Afaik, Docker doesn’t even recommend overlapping
    latest
    tags — basically the ‘tag’ portion of an image should always be unique and pinned to.
    👍
    Billy McMonagle

    Billy McMonagle

    1 year ago
    I know ECR supports multiple tags per image but prefect is fairly opinionated in the format it expects to see. Which is fine, I think.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Apologies for the delayed response here. We're planning on deprecating the environment-variables for configuring the k8s agent (they don't match how prefect configuration normally works, and are a bit hacky). For now they only affect legacy flows still using
    Environment
    based configuration. For flows registered with a run-config (which also includes flows without an explicit
    flow.environment
    set) only options that can be passed to the agent via CLI flags are respected (e.g.
    --image-pull-secrets
    ). We didn't elevate every field in the k8s spec, only ones that are likely to be used by most users. To set
    image_pull_policy
    , you'll need to provide a custom k8s job template. This can be be specified via the
    --job-template
    flag, and only needs to include the fields want to add (prefect will set everything it requires). The default job template is
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          containers:
            - name: flow
    Billy McMonagle

    Billy McMonagle

    1 year ago
    OK, that makes sense, thank you @Jim Crist-Harif. For now I am going to use the recommended tagging strategy, which so far seems to have resolved my issue.
    Are you planning to deprecate the env vars for configuring the k8s agent itself, or the env vars for configuring how the k8s agent will launch flows?
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    I'm not sure I get the distinction. To clarify, the following environment variables will eventually be ignored by the k8s agent: https://github.com/PrefectHQ/prefect/blob/29ead08a96b1d003de1941753dfa017a7160bdad/src/prefect/agent/kubernetes/agent.py#L438-L455 Some of these still are respected for both
    environment
    and
    run_config
    based flows (
    NAMESPACE
    ,
    SERVICE_ACCOUNT_NAME
    , and
    IMAGE_PULL_SECRETS
    ), but most are already ignored for non-
    environment
    based flows.
    Billy McMonagle

    Billy McMonagle

    1 year ago
    got it!