I'm trying to deploy an agent in kubernetes locall...
# ask-community
j
I'm trying to deploy an agent in kubernetes locally and setup the prefect cloud url and api keys as shown in the video instructions. I'm also confused in the 3rd video instruction why we are modifying a k8 manifest created for orion instead of the agent, given we are working with prefect cloud in the 3rd video which should take the place of orion? Am I wrong that prefect cloud should be taking the place of orion? After creating the k8 manifest for the agent and making the modifications in the 2nd and 3rd video I'm getting a json decode error. I don't think the agent is able to hit the queue (which I verified does exist).
Copy code
2023-03-13 13:48:43   ___ ___ ___ ___ ___ ___ _____     _   ___ ___ _  _ _____
2023-03-13 13:48:43  | _ \ _ \ __| __| __/ __|_   _|   /_\ / __| __| \| |_   _|
2023-03-13 13:48:43  |  _/   / _|| _|| _| (__  | |    / _ \ (_ | _|| .` | | |
2023-03-13 13:48:43  |_| |_|_\___|_| |___\___| |_|   /_/ \_\___|___|_|\_| |_|
2023-03-13 13:48:43 
2023-03-13 13:48:43 
2023-03-13 13:48:43 Agent started! Looking for work from queue(s): kubernetes...
2023-03-13 13:48:43 An exception occurred.
2023-03-13 13:48:43 Traceback (most recent call last):
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
2023-03-13 13:48:43     return fn(*args, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 230, in coroutine_wrapper
2023-03-13 13:48:43     return run_async_in_new_loop(async_fn, *args, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 181, in run_async_in_new_loop
2023-03-13 13:48:43     return anyio.run(partial(__fn, *args, **kwargs))
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
2023-03-13 13:48:43     return asynclib.run(func, *args, **backend_options)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
2023-03-13 13:48:43     return native_run(wrapper(), debug=debug)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run
2023-03-13 13:48:43     return loop.run_until_complete(main)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
2023-03-13 13:48:43     return future.result()
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
2023-03-13 13:48:43     return await func(*args)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/cli/agent.py", line 201, in start
2023-03-13 13:48:43     tg.start_soon(
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
2023-03-13 13:48:43     raise exceptions[0]
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/utilities/services.py", line 46, in critical_service_loop
2023-03-13 13:48:43     await workload()
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 203, in get_and_submit_flow_runs
2023-03-13 13:48:43     async for work_queue in self.get_work_queues():
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 144, in get_work_queues
2023-03-13 13:48:43     work_queue = await self.client.read_work_queue_by_name(
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/prefect/client/orchestration.py", line 853, in read_work_queue_by_name
2023-03-13 13:48:43     return schemas.core.WorkQueue.parse_obj(response.json())
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/site-packages/httpx/_models.py", line 756, in json
2023-03-13 13:48:43     return jsonlib.loads(self.text, **kwargs)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
2023-03-13 13:48:43     return _default_decoder.decode(s)
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
2023-03-13 13:48:43     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2023-03-13 13:48:43   File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
2023-03-13 13:48:43     raise JSONDecodeError("Expecting value", s, err.value) from None
2023-03-13 13:48:43 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
r
have you set the api+key environment variables on the agent
j
I did I used a k8 secret
I wonder if I'm misunderstanding something. Do we need to have orion and agent running locally at same time?
is orion listening to queue on prefect cloud
and then starting agents in k8
r
No if your using prefect cloud, just an agent with the correct url+key env vars
j
ok that was my initial guess
maybe my k8 api secret isn't working I wonder if I can bash into a k8 instance and printenv
hmm maybe i'll just hardcode it for now in the manifest to make things easy
r
yeah it definitely looks like the agent cant connect so check url+key
j
since there is no orion server locally I don't need to do any port forwarding i'm assuming e.g. 4200:4200
r
nope as long as outbound connectivity is allowed
j
same issue even on hardcoding api key
fails within seconds
r
it looks for environmental variables
j
I changed the start command on the manifest to 'printenv' and I am seeing the API KEY and URL
r
which prefect version?
those vids are quite old
j
Copy code
image: prefecthq/prefect:2.8.5-python3.8
r
can you share your agent manifest
j
Copy code
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefect-agent
  namespace: uptime-1-0-0
  labels:
    app: prefect-agent
spec:
  selector:
    matchLabels:
      app: prefect-agent
  replicas: 1
  template:
    metadata:
      labels:
        app: prefect-agent
    spec:
      containers:
        - name: agent
          image: prefecthq/prefect:2.8.5-python3.8
          #command: ["printenv"]
          command: ["prefect", "agent", "start", "-q", "kubernetes"]
          imagePullPolicy: "IfNotPresent"
          env:
            - name: PREFECT_API_URL
              value: <https://app.prefect.cloud/account/MY_ACCOUNT_OBFUSCATED/workspace/MY_WORKSPACE_OBFUSCATED>
            - name: PREFECT_API_KEY
              value: SECRET_KEY
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: prefect-agent
  namespace: uptime-1-0-0
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log", "pods/status"]
    verbs: ["get", "watch", "list"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  name: prefect-agent-role-binding
  namespace: uptime-1-0-0
subjects:
  - kind: ServiceAccount
    name: default
    namespace: uptime-1-0-0
roleRef:
  kind: Role
  name: prefect-agent
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRole
metadata:
  name: prefect-agent
rules:
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRoleBinding
metadata:
  name: prefect-agent-cluster-role-binding
subjects:
  - kind: ServiceAccount
    name: default
    namespace: uptime-1-0-0
roleRef:
  kind: ClusterRole
  name: prefect-agent
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
and i'm running like this:
Copy code
kubectl apply -f uptime-1-0-0.yaml -n uptime-1-0-0
your url is wrong
api/accounts
not sure why they have an api part too
j
Thanks that was it! haha I have a different issue now:
Copy code
2023-03-13 14:45:01 Agent started! Looking for work from queue(s): kubernetes...
2023-03-13 14:45:02 21:45:02.084 | INFO    | prefect.agent - Submitting flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7'
2023-03-13 14:45:03 21:45:03.268 | INFO    | prefect.infrastructure.kubernetes-job - Job 'abiding-anaconda-svrhv': Pod has status 'Pending'.
2023-03-13 14:45:03 21:45:03.879 | INFO    | prefect.agent - Completed submission of flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7'
2023-03-13 14:45:13 21:45:13.371 | INFO    | prefect.agent - Found 2 flow runs awaiting cancellation.
2023-03-13 14:45:13 21:45:13.372 | ERROR   | prefect.agent - Flow run 'fb444554-c2ed-40c2-97b5-97d7ab868006' does not have an infrastructure pid attached. Cancellation cannot be guaranteed.
2023-03-13 14:45:13 21:45:13.372 | ERROR   | prefect.agent - Flow run 'df11019e-b67d-4e9c-aa8d-efc0a50b1f11' does not have an infrastructure pid attached. Cancellation cannot be guaranteed.
2023-03-13 14:46:03 21:46:03.265 | ERROR   | prefect.infrastructure.kubernetes-job - Job 'abiding-anaconda-svrhv': Pod never started.
2023-03-13 14:46:03 21:46:03.482 | INFO    | prefect.agent - Reported flow run 'b2539fe3-512a-4793-bb4b-8a442b31f5f7' as crashed: Flow run infrastructure exited with non-zero status code -1.
r
now to check your deployment
j
Copy code
gcs_block = GCS.load("gcs-foo")
        deployment = Deployment.build_from_flow(
            name=deployment_name,
            flow=foo_flow,
            work_queue_name="kubernetes",
            parameters={
                "foo": "bar"
            },
            storage=gcs_block,
            infrastructure=KubernetesJob(
                namespace='uptime-1-0-0',
                image = 'prefect-orion:2.8.5',
                image_pull_policy = KubernetesImagePullPolicy.IF_NOT_PRESENT,
                env = {
                    "EXTRA_PIP_PACKAGES": "gcsfs",
                    "USE_SSL": False,
                },
            ),
        )
        deployment.apply()
r
your image name doesnt look right
something like prefecthq/prefect:2.8.5-python3.10
👀 1
orion has been dropped as a name now also, just server
btw easier to use a k8s job block so that you can set defaults for use across flows
j
thanks, yeah it was the image that was the problem! I'll look into the k8 job block as well.
r
cool
j
seriously saved me a lot of time, thanks so much!
👍 1
quick question: If we have to install custom libraries like pandas, etc. should we just create our own pre-built image on a registry and pull from there to speed up flow runs? Rather than including them in
Copy code
env = {
    "EXTRA_PIP_PACKAGES": "gcsfs"
},
r
its give and take, I would do that when you have a decent idea of commonality across flows and ci/cd speed becomes an issue, prefect release once a week so its a balancing act
I started off like that but binned it for now, deployment/flow speed not such an issue for me
j
I ask because my 3rd party libraries looks like:
Copy code
pip==22.2.2
pandas==1.4.3
prefect-gcp==0.2.1
requests==2.28.1
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.14.2
google-cloud-storage==2.5.0
pandas-gbq==0.17.8
and I'm not sure if that will be really taxing, as in each flow will have to re-install these on each run? Or is that rather each pod has to install these once
r
yeah each pod will have to download but k8s can cache images
packages not images....e.g https://link.medium.com/IT9AvrT68xb