I have a flow with docker storage, running with th...
# prefect-community
b
I have a flow with docker storage, running with the k8s agent. When I run my flow and look at the logs, it seems to be executing an old version of my flow. Details in thread...
From my agent:
Copy code
Environment variables

    PREFECT__CLOUD__AGENT__AUTH_TOKEN={{undefined}}
    PREFECT__CLOUD__API=<https://api.prefect.io>
    NAMESPACE=default
    IMAGE_PULL_SECRETS=undefined
    PREFECT__CLOUD__AGENT__LABELS=['production']
    JOB_MEM_REQUEST=undefined
    JOB_MEM_LIMIT=undefined
    JOB_CPU_REQUEST=undefined
    JOB_CPU_LIMIT=undefined
    IMAGE_PULL_POLICY=Always
    SERVICE_ACCOUNT_NAME=undefined
    PREFECT__BACKEND=cloud
    PREFECT__CLOUD__AGENT__AGENT_ADDRESS=http://:8080
My flow:
Copy code
@task
def execute_query(query):
    print("inside execute_query task to run query ", query)

storage = Docker(
    registry_url=REGISTRY_URL,
    dockerfile=DOCKERFILE,
    image_name=IMAGE_NAME,
    image_tag=IMAGE_TAG,
)

with Flow("all_accounts", storage=storage) as all_accounts_flow:
    execute_query("all_accounts")

storage.build()

if __name__ == "__main__":
    all_accounts_flow.register(project_name=PROJECT_NAME)
I'm almost totally certain I'm doing something silly, so thanks in advance!
j
Hmm I can’t tell what’s happening judging by the snippets you have provided. Is there a chance that when you call
.register
is isn’t updating the image_name:tag in your image repository?
z
Is this limited to the K8s agent?
j
^ Also valid! There might be a chance that k8s isn’t pulling the new image if it has the same name:tag
z
There is the
IMAGE_PULL_POLICY=Always
env var but I’d have to consult to the code to see how that’s templated in the K8s jobs
Copy code
if os.getenv("IMAGE_PULL_POLICY"):
            job["spec"]["template"]["spec"]["containers"][0][
                "imagePullPolicy"
            ] = os.getenv("IMAGE_PULL_POLICY")
hmm
Can you inspect the image pull policy on the flow pod with kubectl?
b
Hey @michael thanks for the ideas, sorry I stepped away...
i think you guys are on the right track, because I've got an image:label that is not updating.
I'm trying to understand the best/recommended way to setup my image name and tag. Currently, this is set to something like
prefect_repo:appname
but it sounds like I should do
appname:latest
or something else.
I did check the image pull policy on the job, and it is set to
IfNotPresent
...
z
Well, that’s the culprit then. I’m not certain why that environment variable isn’t being respected. Can you check the environment of the agent using kubectl?
Are you using a
RunConfig
?
b
I do not have a
RunConfig
set currently, although I do intend to use one to set some env vars, once I get over this hump.
Agent inspection incoming...
What do you want to know about the agent environment? Just env vars?
z
If it has a IMAGE_PULL_POLICY env var
(and the value)
b
Copy code
│     env:                                                                                                                                                                                                                                                   │
│     - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN                                                                                                                                                                                                              │
│       valueFrom:                                                                                                                                                                                                                                           │
│         secretKeyRef:                                                                                                                                                                                                                                      │
│           key: prefect_runner_token                                                                                                                                                                                                                        │
│           name: prefect-orchestration-main                                                                                                                                                                                                                 │
│     - name: PREFECT__CLOUD__API                                                                                                                                                                                                                            │
│       value: <https://api.prefect.io>                                                                                                                                                                                                                        │
│     - name: NAMESPACE                                                                                                                                                                                                                                      │
│       value: default                                                                                                                                                                                                                                       │
│     - name: IMAGE_PULL_SECRETS                                                                                                                                                                                                                             │
│     - name: PREFECT__CLOUD__AGENT__LABELS                                                                                                                                                                                                                  │
│       value: '[''production'']'                                                                                                                                                                                                                            │
│     - name: JOB_MEM_REQUEST                                                                                                                                                                                                                                │
│     - name: JOB_MEM_LIMIT                                                                                                                                                                                                                                  │
│     - name: JOB_CPU_REQUEST                                                                                                                                                                                                                                │
│     - name: JOB_CPU_LIMIT                                                                                                                                                                                                                                  │
│     - name: IMAGE_PULL_POLICY                                                                                                                                                                                                                              │
│       value: Always
z
Very interesting
b
perhaps I need to single quote
Always
z
If you use a KubernetesRunConfig it ignores those configuring env vars, might be easiest to set the image pull policy in that
Although I am curious what’s going on here
b
Hm, I will try that, since I intend to use a RunConfig.
z
Can you confirm that the image pull policy is
IfNotPresent
in both the Job template and the flow run Pod?
I’ve forwarded this to another member of the team who’s worked with this recently.
👍 1
b
Hm, I'm not sure how to distinguish the two... what I provided before was the yaml from the running
Job
(I'm using k9s, FWIW)
z
@Billy McMonagle sorry I was wrong, I don’t see a config option in the
KubernetesRun
for image pull policy.
b
Gotcha, OK.
If I end up needing to make a custom job template, that is going to be fine with me, but I bet we can get to the bottom of this
z
The agent creates a
Job
for a flow run. The
Job
creates a
Pod
from its template to actually run the flow. If the
Pod
fails, the
Job
will create a new one until it succeeds. (https://kubernetes.io/docs/concepts/workloads/controllers/job/)
b
that job line is the only thing I see, but I'm obviously new to k8s as well so I'm still poking around
z
Can you use
kubectl
?
kubectl get jobs
will give you the actual
Job
instances. That looks like a
Pod
that is named after the job 🙂
b
Yes, I can. Hard to get much bc the job runs for like 1s.
Copy code
❯ k get jobs
NAME                   COMPLETIONS   DURATION   AGE
prefect-job-21ec7c17   0/1           1s         1s
If you know the command to get the yaml for the job itself...
z
Then
kubectl describe job prefect-job-21ec7c17
Should give us the image pull policy
If you put a sleep in your flow the job would run longer
b
perfect
yes but I can't update my flow code bc of the image pull problem 🙃
z
(ahaha)
b
Copy code
❯ k describe jobs prefect-job-1d696b93
Name:           prefect-job-1d696b93
Namespace:      default
Selector:       controller-uid=b139374d-d47a-4b0f-ab98-18cead0a8eb9
Labels:         <http://prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5|prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5>
                <http://prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b|prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b>
                <http://prefect.io/identifier=1d696b93|prefect.io/identifier=1d696b93>
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Tue, 19 Jan 2021 18:25:04 -0500
Completed At:   Tue, 19 Jan 2021 18:25:12 -0500
Duration:       8s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=b139374d-d47a-4b0f-ab98-18cead0a8eb9
           job-name=prefect-job-1d696b93
           <http://prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5|prefect.io/flow_id=678d4e82-1e48-4d9c-9748-ade3d73867f5>
           <http://prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b|prefect.io/flow_run_id=5e07a5e7-3e49-447d-aba7-47112061db8b>
           <http://prefect.io/identifier=1d696b93|prefect.io/identifier=1d696b93>
  Containers:
   flow:
    Image:      <http://XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe|XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe>
    Port:       <none>
    Host Port:  <none>
    Args:
      prefect
      execute
      flow-run
    Environment:
      PREFECT__CLOUD__API:                          <https://api.prefect.io>
      PREFECT__CLOUD__AUTH_TOKEN:                   XXX
      PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
      PREFECT__CONTEXT__FLOW_RUN_ID:                5e07a5e7-3e49-447d-aba7-47112061db8b
      PREFECT__CONTEXT__FLOW_ID:                    678d4e82-1e48-4d9c-9748-ade3d73867f5
      PREFECT__CONTEXT__IMAGE:                      <http://XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe|XXX.dkr.ecr.us-east-1.amazonaws.com/prefect-orchestration:grebe>
      PREFECT__LOGGING__LOG_TO_CLOUD:               true
      PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
      PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
    Mounts:                                         <none>
  Volumes:                                          <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  12s   job-controller  Created pod: prefect-job-1d696b93-gls7j
  Normal  Completed         4s    job-controller  Job complete
I don't see a pull policy at all
z
Me neither -.- I’m going to pull up a cluster one sec
kubectl get job <HERE> -o yaml | grep imagePull
👍 1
Or leave out the grep to take a peek at the whole thing. Guess it’s not included in
grep
b
imagePullPolicy: IfNotPresent
z
Alrighty, so this is a pretty clear confirmation that the agent is not respecting that env variable — the code looks fine to me so we’ll have to reproduce this ourselves and debug
Sorry about that
For now, I’d recommend naming your images differently
b
that's OK! glad to figure it out
I will keep poking and if I figure out why this is happening, I will flag for you. Thanks a ton.
z
Sweet thanks! Wonderful to work through it with ya.
The default tag is the timestamp to avoid overwriting old images. It’s generated with
slugify(pendulum.now("utc").isoformat())
b
Let me ask a somewhat related question while I have you here......
Is the typical practice to create one ECR repository per flow, assuming you do not store multiple flows in a single docker storage instance?
z
Generally, I’d recommend
<flow>:<unique-id>
if you want do keep them all in one repo
prefect_flows:<flow>-<unique-id>
makes sense to me.
Yeah basically
I’m not sure on ECR but that’s what we do in GCS internally
b
Gotcha. There is no technical reason why we cannot do that, but the
prefect_flows:<flow>-<unique-id>
convention feels a little bit nicer to me.
z
Afaik, Docker doesn’t even recommend overlapping
latest
tags — basically the ‘tag’ portion of an image should always be unique and pinned to.
👍
b
I know ECR supports multiple tags per image but prefect is fairly opinionated in the format it expects to see. Which is fine, I think.
j
Apologies for the delayed response here. We're planning on deprecating the environment-variables for configuring the k8s agent (they don't match how prefect configuration normally works, and are a bit hacky). For now they only affect legacy flows still using
Environment
based configuration. For flows registered with a run-config (which also includes flows without an explicit
flow.environment
set) only options that can be passed to the agent via CLI flags are respected (e.g.
--image-pull-secrets
). We didn't elevate every field in the k8s spec, only ones that are likely to be used by most users. To set
image_pull_policy
, you'll need to provide a custom k8s job template. This can be be specified via the
--job-template
flag, and only needs to include the fields want to add (prefect will set everything it requires). The default job template is
Copy code
apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
        - name: flow
b
OK, that makes sense, thank you @Jim Crist-Harif. For now I am going to use the recommended tagging strategy, which so far seems to have resolved my issue.
👍 1
Are you planning to deprecate the env vars for configuring the k8s agent itself, or the env vars for configuring how the k8s agent will launch flows?
j
I'm not sure I get the distinction. To clarify, the following environment variables will eventually be ignored by the k8s agent: https://github.com/PrefectHQ/prefect/blob/29ead08a96b1d003de1941753dfa017a7160bdad/src/prefect/agent/kubernetes/agent.py#L438-L455 Some of these still are respected for both
environment
and
run_config
based flows (
NAMESPACE
,
SERVICE_ACCOUNT_NAME
, and
IMAGE_PULL_SECRETS
), but most are already ignored for non-
environment
based flows.
b
got it!