https://prefect.io logo
Title
j

John Horn

03/13/2023, 8:54 PM
I'm trying to deploy an agent in kubernetes locally and setup the prefect cloud url and api keys as shown in the video instructions. I'm also confused in the 3rd video instruction why we are modifying a k8 manifest created for orion instead of the agent, given we are working with prefect cloud in the 3rd video which should take the place of orion? Am I wrong that prefect cloud should be taking the place of orion? After creating the k8 manifest for the agent and making the modifications in the 2nd and 3rd video I'm getting a json decode error. I don't think the agent is able to hit the queue (which I verified does exist).
2023-03-13 13:48:43   ___ ___ ___ ___ ___ ___ _____     _   ___ ___ _  _ _____
2023-03-13 13:48:43  | _ \ _ \ __| __| __/ __|_   _|   /_\ / __| __| \| |_   _|
2023-03-13 13:48:43  |  _/   / _|| _|| _| (__  | |    / _ \ (_ | _|| .` | | |
2023-03-13 13:48:43  |_| |_|_\___|_| |___\___| |_|   /_/ \_\___|___|_|\_| |_|
2023-03-13 13:48:43 
2023-03-13 13:48:43 
2023-03-13 13:48:43 Agent started! Looking for work from queue(s): kubernetes...
2023-03-13 13:48:43 An exception occurred.
2023-03-13 13:48:43 Traceback (most recent call last):
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
2023-03-13 13:48:43     return fn(*args, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 230, in coroutine_wrapper
2023-03-13 13:48:43     return run_async_in_new_loop(async_fn, *args, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 181, in run_async_in_new_loop
2023-03-13 13:48:43     return anyio.run(partial(__fn, *args, **kwargs))
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
2023-03-13 13:48:43     return asynclib.run(func, *args, **backend_options)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
2023-03-13 13:48:43     return native_run(wrapper(), debug=debug)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run
2023-03-13 13:48:43     return loop.run_until_complete(main)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
2023-03-13 13:48:43     return future.result()
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
2023-03-13 13:48:43     return await func(*args)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/cli/agent.py", line 201, in start
2023-03-13 13:48:43     tg.start_soon(
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
2023-03-13 13:48:43     raise exceptions[0]
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/services.py", line 46, in critical_service_loop
2023-03-13 13:48:43     await workload()
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 203, in get_and_submit_flow_runs
2023-03-13 13:48:43     async for work_queue in self.get_work_queues():
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 144, in get_work_queues
2023-03-13 13:48:43     work_queue = await self.client.read_work_queue_by_name(
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/client/orchestration.py", line 853, in read_work_queue_by_name
2023-03-13 13:48:43     return schemas.core.WorkQueue.parse_obj(response.json())
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/httpx/_models.py", line 756, in json
2023-03-13 13:48:43     return jsonlib.loads(self.text, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
2023-03-13 13:48:43     return _default_decoder.decode(s)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
2023-03-13 13:48:43     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
2023-03-13 13:48:43     raise JSONDecodeError("Expecting value", s, err.value) from None
2023-03-13 13:48:43 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
r

redsquare

03/13/2023, 9:17 PM
have you set the api+key environment variables on the agent
j

John Horn

03/13/2023, 9:17 PM
I did I used a k8 secret
I wonder if I'm misunderstanding something. Do we need to have orion and agent running locally at same time?
is orion listening to queue on prefect cloud
and then starting agents in k8
r

redsquare

03/13/2023, 9:18 PM
No if your using prefect cloud, just an agent with the correct url+key env vars
j

John Horn

03/13/2023, 9:18 PM
ok that was my initial guess
maybe my k8 api secret isn't working I wonder if I can bash into a k8 instance and printenv
hmm maybe i'll just hardcode it for now in the manifest to make things easy
r

redsquare

03/13/2023, 9:21 PM
yeah it definitely looks like the agent cant connect so check url+key
j

John Horn

03/13/2023, 9:21 PM
since there is no orion server locally I don't need to do any port forwarding i'm assuming e.g. 4200:4200
r

redsquare

03/13/2023, 9:22 PM
nope as long as outbound connectivity is allowed
j

John Horn

03/13/2023, 9:23 PM
same issue even on hardcoding api key
fails within seconds
r

redsquare

03/13/2023, 9:24 PM
it looks for environmental variables
j

John Horn

03/13/2023, 9:27 PM
I changed the start command on the manifest to 'printenv' and I am seeing the API KEY and URL
r

redsquare

03/13/2023, 9:29 PM
which prefect version?
those vids are quite old
j

John Horn

03/13/2023, 9:29 PM
image: prefecthq/prefect:2.8.5-python3.8
r

redsquare

03/13/2023, 9:30 PM
can you share your agent manifest
j

John Horn

03/13/2023, 9:32 PM
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefect-agent
  namespace: uptime-1-0-0
  labels:
    app: prefect-agent
spec:
  selector:
    matchLabels:
      app: prefect-agent
  replicas: 1
  template:
    metadata:
      labels:
        app: prefect-agent
    spec:
      containers:
        - name: agent
          image: prefecthq/prefect:2.8.5-python3.8
          #command: ["printenv"]
          command: ["prefect", "agent", "start", "-q", "kubernetes"]
          imagePullPolicy: "IfNotPresent"
          env:
            - name: PREFECT_API_URL
              value: <https://app.prefect.cloud/account/MY_ACCOUNT_OBFUSCATED/workspace/MY_WORKSPACE_OBFUSCATED>
            - name: PREFECT_API_KEY
              value: SECRET_KEY
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: prefect-agent
  namespace: uptime-1-0-0
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log", "pods/status"]
    verbs: ["get", "watch", "list"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  name: prefect-agent-role-binding
  namespace: uptime-1-0-0
subjects:
  - kind: ServiceAccount
    name: default
    namespace: uptime-1-0-0
roleRef:
  kind: Role
  name: prefect-agent
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRole
metadata:
  name: prefect-agent
rules:
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRoleBinding
metadata:
  name: prefect-agent-cluster-role-binding
subjects:
  - kind: ServiceAccount
    name: default
    namespace: uptime-1-0-0
roleRef:
  kind: ClusterRole
  name: prefect-agent
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
and i'm running like this:
kubectl apply -f uptime-1-0-0.yaml -n uptime-1-0-0
your url is wrong
api/accounts
not sure why they have an api part too
j

John Horn

03/13/2023, 9:49 PM
Thanks that was it! haha I have a different issue now:
2023-03-13 14:45:01 Agent started! Looking for work from queue(s): kubernetes...
2023-03-13 14:45:02 21:45:02.084 | INFO    | prefect.agent - Submitting flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7'
2023-03-13 14:45:03 21:45:03.268 | INFO    | prefect.infrastructure.kubernetes-job - Job 'abiding-anaconda-svrhv': Pod has status 'Pending'.
2023-03-13 14:45:03 21:45:03.879 | INFO    | prefect.agent - Completed submission of flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7'
2023-03-13 14:45:13 21:45:13.371 | INFO    | prefect.agent - Found 2 flow runs awaiting cancellation.
2023-03-13 14:45:13 21:45:13.372 | ERROR   | prefect.agent - Flow run 'fb444554-c2ed-40c2-97b5-97d7ab868006' does not have an infrastructure pid attached. Cancellation cannot be guaranteed.
2023-03-13 14:45:13 21:45:13.372 | ERROR   | prefect.agent - Flow run 'df11019e-b67d-4e9c-aa8d-efc0a50b1f11' does not have an infrastructure pid attached. Cancellation cannot be guaranteed.
2023-03-13 14:46:03 21:46:03.265 | ERROR   | prefect.infrastructure.kubernetes-job - Job 'abiding-anaconda-svrhv': Pod never started.
2023-03-13 14:46:03 21:46:03.482 | INFO    | prefect.agent - Reported flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7' as crashed: Flow run infrastructure exited with non-zero status code -1.
r

redsquare

03/13/2023, 9:50 PM
now to check your deployment
j

John Horn

03/13/2023, 9:55 PM
gcs_block = GCS.load("gcs-foo")
        deployment = Deployment.build_from_flow(
            name=deployment_name,
            flow=foo_flow,
            work_queue_name="kubernetes",
            parameters={
                "foo": "bar"
            },
            storage=gcs_block,
            infrastructure=KubernetesJob(
                namespace='uptime-1-0-0',
                image = 'prefect-orion:2.8.5',
                image_pull_policy = KubernetesImagePullPolicy.IF_NOT_PRESENT,
                env = {
                    "EXTRA_PIP_PACKAGES": "gcsfs",
                    "USE_SSL": False,
                },
            ),
        )
        deployment.apply()
r

redsquare

03/13/2023, 9:57 PM
your image name doesnt look right
something like prefecthq/prefect:2.8.5-python3.10
👀 1
orion has been dropped as a name now also, just server
btw easier to use a k8s job block so that you can set defaults for use across flows
j

John Horn

03/13/2023, 10:28 PM
thanks, yeah it was the image that was the problem! I'll look into the k8 job block as well.
r

redsquare

03/13/2023, 10:29 PM
cool
j

John Horn

03/13/2023, 10:30 PM
seriously saved me a lot of time, thanks so much!
👍 1
quick question: If we have to install custom libraries like pandas, etc. should we just create our own pre-built image on a registry and pull from there to speed up flow runs? Rather than including them in
env = {
    "EXTRA_PIP_PACKAGES": "gcsfs"
},
r

redsquare

03/13/2023, 10:36 PM
its give and take, I would do that when you have a decent idea of commonality across flows and ci/cd speed becomes an issue, prefect release once a week so its a balancing act
I started off like that but binned it for now, deployment/flow speed not such an issue for me
j

John Horn

03/13/2023, 10:40 PM
I ask because my 3rd party libraries looks like:
pip==22.2.2
pandas==1.4.3
prefect-gcp==0.2.1
requests==2.28.1
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.14.2
google-cloud-storage==2.5.0
pandas-gbq==0.17.8
and I'm not sure if that will be really taxing, as in each flow will have to re-install these on each run? Or is that rather each pod has to install these once
r

redsquare

03/13/2023, 10:46 PM
yeah each pod will have to download but k8s can cache images
packages not images....e.g https://link.medium.com/IT9AvrT68xb