Hello there! New Prefect user here. :slightly_smil...
# prefect-cloud
g
Hello there! New Prefect user here. ๐Ÿ™‚ We're trying to run Prefect on a GKE cluster with autopilot (agent on GKE + Prefect cloud). I was able to get the agent working, and to figure out how to set up the blocks (GCS, KubernetesJob). Now that I'm finally able to run my example flow, though, I'm getting the following weird error from the job pod:
Copy code
Invalid flow run id. Recieved arguments: ['/usr/local/lib/python3.10/site-packages/prefect/engine.py']
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1594, in <module>
    flow_run_id = UUID(
  File "/usr/local/lib/python3.10/uuid.py", line 171, in __init__
    raise TypeError('one of the hex, bytes, bytes_le, fields, '
TypeError: one of the hex, bytes, bytes_le, fields, or int arguments must be given
it appears it's sending command line arguments when it shouldn't?
I'm building my own image for the task runner (not sure that's the proper name - the pod that the agent fires to run the flow)
but I didn't do anything too funky with it, I think, other than installing packages:
Copy code
FROM prefecthq/prefect:2.6.5-python3.10

COPY pyproject.toml /opt/prefect
COPY poetry.lock /opt/prefect

RUN pip install poetry
RUN poetry config virtualenvs.create false && poetry install --only main
Any ideas of where I might be messing things up? Anyhow, thanks in advance for the great project and in any help getting this to work. ๐Ÿ™‚
m
For Sure ๐Ÿ˜„, I don't think the image is the problem in this case, What version of prefect is your agent running?
g
Hey! I'm using
prefecthq/prefect:2.6.5-python3.9
do Python versions have to match? ๐Ÿ˜ฌ
m
I'm actually not sure off the top of my head I know that the Prefect Agent needs to be a greater or equal version of the version of prefect running your flow in the deployment but I'm not sure about the python version, though it can't hurt to try
g
OK, will try it!
for reference, this is my terraform descriptor for the agent:
Copy code
resource "kubernetes_deployment" "prefect-agent" {
  metadata {
    name      = "prefect-agent"
    namespace = "default"
    labels    = {
      app = "prefect-agent"
    }
  }

  spec {
    replicas = 1
    selector {
      match_labels = {
        app = "prefect-agent"
      }
    }
    template {
      metadata {
        labels = {
          app = "prefect-agent"
        }
      }
      spec {
        container {
          name              = "agent"
          image             = "prefecthq/prefect:2.6.5-python3.9"
          command           = ["prefect", "agent", "start", "-q", "clarity-production"]
          image_pull_policy = "IfNotPresent"
          env {
            name  = "PREFECT_API_URL"
            value = local.prefect_cloud_api_url
          }
          env {
            name  = "PREFECT_API_KEY"
            value = data.google_secret_manager_secret_version.prefect-cloud-api-key.secret_data
          }
        }
      }
    }
  }
}
m
For sure, also side not our senior community engineer wrote a blog on an adjacent topic so there might be some insight to glean from there as well in regards to this https://medium.com/the-prefect-blog/serverless-prefect-flows-with-google-cloud-run-jobs-23edbf371175 ๐Ÿ˜„
g
Yeah that looked awesome! But I have a recollection that cloud run has a 1 hour time limit for tasks and that scared me away from using it in this case ๐Ÿ™‚
Ugh, still getting the same error ๐Ÿ˜•
tried using an unmodified prefect image, same result. Will try downgrading to an older version and see what happens
m
Hmm let me check a couple things.
๐Ÿ‘ 1
g
So this is looking really weird. ๐Ÿ™‚ As far as I can see, it's tripping like, in the first few lines of code after the engine boots
Copy code
if __name__ == "__main__":
    import os
    import sys

    try:
        flow_run_id = UUID(
            sys.argv[1] if len(sys.argv) > 1 else os.environ.get("PREFECT__FLOW_RUN_ID")
        )
    except Exception:
        engine_logger.error(
            f"Invalid flow run id. Recieved arguments: {sys.argv}", exc_info=True
        )
        exit(1)
so I instrumented the startup script to print the actual command that's being run
and I get:
Copy code
****** RUNNING COMMAND: python -m prefect.engine *****
Invalid flow run id. Recieved arguments: ['/usr/local/lib/python3.10/site-packages/prefect/engine.py']
Traceback (most recent call last):
hmm...
OK so -m unpacks the module into its full path, so that explains why I see "/usr/local/lib..." instead of just the module's name, but it should not be setting argv[1] to the path of the module. That's not how python -m behaves in my local machine
that's the weirdness I still can't explain
will instrument the engine to see what on earth it's getting
m
Yeah I haven't seen that before either, I'll try asking around and see what I can dig up too
g
thanks a lot!
r
I think that's normal; I believe that when you run
python -m
, getting the module location as argv[0] is expected. There's no argv[1] in your args array, but that's not the problem
g
ah, I see, I'm looking at the wrong hypothesis
r
What you're seeing is (I think) happening because the
PREFECT__FLOW_RUN_ID
isn't present
I get the same error message if I run
python -m prefect.engine
without that env var present, at least
g
Hmmm any ideas why it's not being injected?
btw now I realize it's printing the whole argv in the error message already. Sorry, I must be sleepy ๐Ÿ™‚
r
I don't yet know why it's not being added; I haven't worked with
KubernetesJob
much, but I wrote another infrastructure block so I'm decently familiar with them. I'm looking through the block's code now to see if I can find where/why this might happen
And no problem; I only realized because I was working with the args in one of my Python scripts a couple of hours ago, so it was fresh in my mind
g
hey thanks a lot, will do the same
btw I'm messing around with the env in KubernetesJob
I have
Copy code
customizations=[
        {
            'op': 'add',
            'path': '/spec/template/spec/resources',
            'value': {
                'limits': {
                    'memory': '1024Mi',
                    'cpu': '500m'
                }
            }
        },
        {
            'op': 'add',
            'path': '/spec/template/spec/containers/0/env',
            'value': [
                {
                    'name': 'ENV',
                    'value': 'prod'
                },
                {
                    'name': 'COLLECTIONS_PREFIX',
                    'value': ''
                },
                {
                    'name': 'PROJECT_ID',
                    'value': 'window-finance-production'
                }
            ]
        },
        {
            'op': 'add',
            'path': '/spec/template/backoffLimit',
            'value': 3
        }
    ]
wonder if I'm overriding something I shouldnยดt ๐Ÿ˜•
r
That might be overwriting the default env variables the infrastructure block is setting
I think
KubernetesJob
has an extra
env
attribute where you can put environment variables
๐Ÿ™Œ 1
g
oh man
you're right ๐Ÿคฉ
lemme try that
r
I think adding them in customizations instead gets rid of the flow ID - not your fault, I think the block should check for that so I will open a GitHub issue
g
awesome, thanks a lot! Will let you guys know if that solves it in a min
๐Ÿ‘ 1
yay, it worked! ๐Ÿ˜…
๐Ÿ™Œ 1
Thanks a lot for the help, it would've taken me a lot of time to figure this out on my own โค๏ธ
m
Awesome I'm glad you got it working :) thanks @Ryan Peden
๐ŸŽ‰ 1
r
You're welcome, I'm happy to hear it worked ๐Ÿ˜„