< Marvin> What does it mean when my deployments have not rea Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> What does it mean when my deployments...

# ask-marvin

Tom Jordahl

05/16/2024, 3:24 PM

@Marvin What does it mean when my deployments have “not ready” as the status?

Marvin

05/16/2024, 3:24 PM

The status "not ready" for deployments in Prefect could mean a few things, and the exact specifics might differ depending on your context. However, no explicit explanation was directly found in the documentation. The GitHub issues linked seem to relate to deployment status changes when operations, like updates, or changes to the related work pool, are applied. Status changes to "not ready" could be an indication that the deployment isn't healthy, can't start, or some conditions to be considered "ready" haven't been met yet. It could also be related to issues with the work pools or workers it depends on. For more details, you might need to explore more of the Prefect documentation here: - Deployment Mechanics - Tutorial on Deployments Here's a summary of the relevant GitHub issues which discuss deployment statuses: - Add deployment status events on deployment edit and creation: This issue brings up the idea that there should be events emitted when a deployment's status changes from "ready" to "not ready," which suggests that this change of state is significant and should be tracked. - Deployment status persistence and events: This pull request adds the persistence of deployment status and the emission of related events when the status changes to "ready" or "not ready," indicating that these statuses are part of the deployment lifecycle. - Implementing work pool status events: This pull request introduces events related to work pool statuses, which might affect your deployment status as well. If further exploration in the documentation or issues does not clarify your specific scenario, it could be beneficial to consider reaching out directly to the Prefect community or support for more targeted assistance.

Tom Jordahl

05/16/2024, 3:27 PM

@Marvin The Work pool that these deployments are using is showing as “ready” but the deployments using using the work pool are shown as “not ready”. What are the criteria for a deployment to be shown as “Ready”?

Marvin

05/16/2024, 3:27 PM

The search for specific criteria that determines a deployment's "Ready" status did not yield a direct answer in the Prefect documentation. Additionally, no related GitHub issues seem to address your exact concern. From what is documented, it's known that work pools have three statuses: "READY," "NOT_READY," and "PAUSED." A work pool is considered "READY" if it has at least one online worker sending heartbeats. However, the fact that the work pool is "READY" does not necessarily imply that deployments using that work pool will automatically be "READY." To better understand why deployments associated with a "ready" work pool are showing as "not ready," consider the following points, although they might not fully explain your situation: - Each deployment may have dependencies or configuration aspects that affect its readiness status, separate from the work pool. - A deployment's "Ready" status might also be influenced by the health and availability of resources required by the flow or the deployment environment. - There could be version mismatches, misconfigurations, or other issues impacting the deployment lifecycle that keep it from being "READY." Unfortunately, with the available tools and without direct documentation on the exact criteria, it's challenging to provide a precise answer. Here's a recommendation for further action: - Make sure that there are no underlying environment issues or configuration problems with the deployments. - Consult the latest version of the Prefect documentation on Deployments and Work Pools for any updates or detailed information. - As a last resort, you can consider reaching out to the Prefect community or support channels with specifics of your configuration to get more targeted assistance to diagnose the cause behind deployments showing as "not ready."

Bianca Hoch

05/16/2024, 3:33 PM

Hi Tom! Would you mind sharing how you created your deployments? Did you use .serve(), .deploy(), or the

prefect deploy

CLI?

Tom Jordahl

05/16/2024, 3:34 PM

I am using prefect.yaml and

prefect --no-prompt deploy --all

👀 1

Tom Jordahl

05/16/2024, 3:35 PM

Here is my YAML

Copy code

deployments:
  - name: MyFlow
    version: 0.0.1
    description: My deployment
    entrypoint: fastpass/flows/llm_idk_flow//myflow.py:the_flow
    path: /opt/prefect/fastpass/
    tags: *default_tags
    build: null
    push: null
    schedule:
    parameters: { }
    work_pool: *default_kubernetes_pool

Tom Jordahl

05/16/2024, 3:37 PM

Copy code

work_pools:
    default_kubernetes_pool: &default_kubernetes_pool
      name: kubernetes-pool
      work_queue_name: default
      job_variables:
        image: '<http://docker-private-release.corp.adobe.com/prefect_flows:{{|docker-private-release.corp.adobe.com/prefect_flows:{{> $IMAGE_TAG }}'

Tom Jordahl

05/16/2024, 3:38 PM

When I try to run the deployment, I get (after 25 minutes):

Copy code

Worker 'KubernetesWorker 4ce68390-b1a7-49b9-b284-1541e40b3488' submitting flow run 'e634bd63-ae61-4bbd-8127-7f805680a2c2'
06:31:08 PM
prefect.flow_runs.worker
Creating Kubernetes job...
06:31:08 PM
prefect.flow_runs.worker
Failed to submit flow run 'e634bd63-ae61-4bbd-8127-7f805680a2c2' to infrastructure.
Traceback (most recent call last):
...
rllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.20.0.1', port=443): Max retries exceeded with url: /apis/batch/v1/namespaces/ns-team-dcai-eval-fastpass/jobs (Caused by ConnectTimeoutError(, 'Connection to 172.20.0.1 timed out. (connect timeout=None)'))

Tom Jordahl

05/16/2024, 3:39 PM

Which is weird. I have this same flow running in another prefect server installation with no problems.

Bianca Hoch

05/16/2024, 4:11 PM

hmmm..we'll take a look to see if someone else has run into a similar error. Just to confirm, did you follow along with the k8s worker guide? And does the namespace your worker is in match the namespace set in the

namespace

field in your work pool config?

Tom Jordahl

05/16/2024, 4:20 PM

yes, the namespace configured for my pool is that same namespace where the worker is running (I double checked)

👍 1

gratitude thank you 1

Tom Jordahl

05/16/2024, 4:20 PM

I did follow that k8s guide, but I am running my own deployed prefect-server.

Tom Jordahl

05/16/2024, 5:06 PM

@Bianca Hoch I turned on the debug switch for the prefect worker and when I submit a flow to be run I get the following errors over and over:

Copy code

17:04:27.123 | DEBUG   | APILogWorkerThread | prefect._internal.concurrency - Running call get(timeout=1.9999852501787245) in thread 'APILogWorkerThread'
17:04:27.124 | DEBUG   | APILogWorkerThread | prefect._internal.concurrency - <WatcherThreadCancelScope, name='get' RUNNING, runtime=0.00> entered
17:04:29.124 | DEBUG   | APILogWorkerThread | prefect._internal.concurrency - <WatcherThreadCancelScope, name='get' COMPLETED, runtime=2.00> exited
17:04:29.124 | DEBUG   | APILogWorkerThread | prefect._internal.concurrency - Encountered exception in call get(<dropped>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/queue.py", line 179, in get
    raise Empty
_queue.Empty

👀 1

Taylor Curran

05/16/2024, 6:30 PM

Hi @Tom Jordahl we wonder if it might have something to do with the environment where you are starting the k8s worker. Would you have 15 min to hop on a quick debugging call? @Kevin Grismore and I have availability. Here is a calendly link.

Tom Jordahl

05/16/2024, 6:31 PM

Hey Taylor - I set something up for 5pm ET today.

cool llama 1

Chris

05/28/2024, 12:28 PM

Hi @Tom Jordahl & @Taylor Curran, I get the same errors as pointed in this comment above. Did you fixed it? Why is this happening?

Tom Jordahl

05/28/2024, 2:51 PM

I did not address that message directly, but the my ultimate problem was that the default network policy in my K8s namespace was to dis-allow any output egress from pods, including the prefect-worker. Once I installed a network policy to allow port 443 (and 80) egress, my worker could do what it needed to do.

thank you 1

Chris

05/28/2024, 3:06 PM

I'll check it, thanks!

43 Views

Open in Slack

Previous Next