Hi, I've got a problem running flows from Prefect ...
# prefect-server
Hi, I've got a problem running flows from Prefect Server in Kubernetes. It was working at one point but now the flows gets stuck in a
Submitted for execution
state. I found these docs and tried debugging/restarting services with them but no luck:
Copy code
I also added these flags to the kubernetes prefect agent:
--log-level DEBUG --disable-job-deletion
That gave me more detail but still no indication of what the problem is. Any help would be appreciated, thanks!
This is the log from the agent after submitting a flow:
Copy code
[2022-06-22 21:00:01,562] DEBUG - agent | Querying for ready flow runs...
DEBUG:agent:Found 1 ready flow run(s): {'26c959b4-9edd-47c9-98e7-e3997f2dc186'}
[2022-06-22 21:00:01,601] DEBUG - agent | Found 1 ready flow run(s): {'26c959b4-9edd-47c9-98e7-e3997f2dc186'}
[2022-06-22 21:00:01,601] DEBUG - agent | Retrieving metadata for 1 flow run(s)...
DEBUG:agent:Retrieving metadata for 1 flow run(s)...
DEBUG:agent:Submitting flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 for deployment...
[2022-06-22 21:00:01,629] DEBUG - agent | Submitting flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 for deployment...
[2022-06-22 21:00:01,629] DEBUG - agent | Sleeping flow run poller for 0.25 seconds...
DEBUG:agent:Sleeping flow run poller for 0.25 seconds...
[2022-06-22 21:00:01,631] INFO - agent | Deploying flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 to execution environment...
INFO:agent:Deploying flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 to execution environment...
[2022-06-22 21:00:01,631] DEBUG - agent | Updating flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 state from Scheduled -> Submitted...
DEBUG:agent:Updating flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 state from Scheduled -> Submitted...
DEBUG:agent:Creating namespaced job prefect-job-c7b44f14
[2022-06-22 21:00:01,815] DEBUG - agent | Creating namespaced job prefect-job-c7b44f14
DEBUG:agent:Job prefect-job-c7b44f14 created
INFO:agent:Completed deployment of flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186
[2022-06-22 21:00:01,851] DEBUG - agent | Job prefect-job-c7b44f14 created
[2022-06-22 21:00:01,851] INFO - agent | Completed deployment of flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186
[2022-06-22 21:00:01,892] DEBUG - agent | Querying for ready flow runs...
reading that log - it almost seems like the flow is working correctly and completes but it never gets 'marked' as complete in the UI?
Completed deployment of flow
Wondering if it’s pointing at the right endpoint thne
which endpoint should it be pointing at?
this is from the agent startup:
Copy code
dflake@dflake-thinkpad:~/$ kubectl logs arte-prefect-agent-7d94bfd6b4-tsxql -n arte-prefect 

[2022-06-22 20:57:48,996] DEBUG - agent | Environment variables: []
[2022-06-22 20:57:48,996] DEBUG - agent | Max polls: None
[2022-06-22 20:57:48,996] DEBUG - agent | Agent address: <>
[2022-06-22 20:57:48,996] DEBUG - agent | Log to Cloud: True
[2022-06-22 20:57:48,996] DEBUG - agent | Prefect backend: server
[2022-06-22 20:57:48,998] DEBUG - agent | Namespace: arte-prefect
[2022-06-22 20:57:48,998] INFO - agent | Registering agent...
[2022-06-22 20:57:49,213] INFO - agent | Registration successful!
[2022-06-22 20:57:49,213] DEBUG - agent | Assigned agent id: c419c65a-a229-49eb-91ee-6b17b22ac252
[2022-06-22 20:57:49,213] DEBUG - agent | Sending test query to API at '<http://arte-prefect-apollo.arte-prefect:4200/graphql>'...
[2022-06-22 20:57:49,227] DEBUG - agent | Test query successful!

 ____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|

[2022-06-22 20:57:49,227] INFO - agent | Starting KubernetesAgent with labels []
[2022-06-22 20:57:49,227] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2022-06-22 20:57:49,227] INFO - agent | Waiting for flow runs...
[2022-06-22 20:57:49,227] DEBUG - agent | Sending agent heartbeat...
[2022-06-22 20:57:49,231] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
[2022-06-22 20:57:49,290] DEBUG - agent | Running thread pool with 6 workers to handle flow deployment
[2022-06-22 20:57:49,290] DEBUG - agent | Querying for ready flow runs...
[2022-06-22 20:57:49,294] DEBUG - agent | Agent API server listening on port <>
[2022-06-22 20:57:49,391] DEBUG - agent | Heartbeat succesful! Sleeping for 60.0 seconds...
[2022-06-22 20:57:49,393] DEBUG - agent | No ready flow runs found.
[2022-06-22 20:57:49,393] DEBUG - agent | Sleeping flow run poller for 0.5 seconds...
I believe the agent is right since it picked up the Flow. But the Flow must have an endpoint configured also right? Is it?
I'm not sure, is that defined in the flow itself? It's a pretty basic hello_world flow that I'm trying to run:
Copy code
from prefect import Flow, task
from prefect.storage import Azure
from prefect.run_configs import KubernetesRun

FLOW_NAME = "azure_k8s"
STORAGE = Azure(

def hello_world():
    text = f"hello from {FLOW_NAME}"
    return text

with Flow(
    FLOW_NAME, storage=STORAGE, run_config=KubernetesRun(job_template_path="/root/arte-tasks/k8s/prefect-job-template.yaml"),
) as flow:
    hw = hello_world()
More like in the image or with an env var
Copy code
ah - checking
I added in the
now the k8s agent looks like this:
Copy code
      - command:
        - bash
        - -c
        - prefect agent kubernetes start --log-level DEBUG --disable-job-deletion
        - name: PREFECT__CLOUD__API
          value: <http://arte-prefect-apollo.arte-prefect:4200/graphql>
          value: <https://prefect-api.arte.adobe.net/graphql>
        - name: NAMESPACE
          value: arte-prefect
        - name: IMAGE_PULL_SECRETS
          value: '[]'
        - name: JOB_MEM_REQUEST
        - name: JOB_MEM_LIMIT
        - name: JOB_CPU_REQUEST
        - name: JOB_CPU_LIMIT
        - name: IMAGE_PULL_POLICY
        - name: SERVICE_ACCOUNT_NAME
          value: arte-prefect-serviceaccount
        - name: PREFECT__BACKEND
          value: server
          value: <>
        image: prefecthq/prefect:1.1.0-python3.8
running another flow still has no luck though 😞
That looks pretty simple already. Do you have access to pod logs?
yes - I'm not seeing any change from before in the agent logs:
Copy code
dflake@dflake-thinkpad:~/adobe/git_repos/arte-tasks$ kubectl logs pod/arte-prefect-agent-5dcd9b5df9-pmkcq -n arte-prefect 

[2022-06-22 22:17:27,198] DEBUG - agent | Environment variables: []
[2022-06-22 22:17:27,199] DEBUG - agent | Max polls: None
[2022-06-22 22:17:27,199] DEBUG - agent | Agent address: <>
[2022-06-22 22:17:27,199] DEBUG - agent | Log to Cloud: True
[2022-06-22 22:17:27,199] DEBUG - agent | Prefect backend: server
[2022-06-22 22:17:27,291] DEBUG - agent | Namespace: arte-prefect
[2022-06-22 22:17:27,292] INFO - agent | Registering agent...
[2022-06-22 22:17:27,416] INFO - agent | Registration successful!
[2022-06-22 22:17:27,416] DEBUG - agent | Assigned agent id: c419c65a-a229-49eb-91ee-6b17b22ac252
[2022-06-22 22:17:27,417] DEBUG - agent | Sending test query to API at '<http://arte-prefect-apollo.arte-prefect:4200/graphql>'...
[2022-06-22 22:17:27,431] DEBUG - agent | Test query successful!

 ____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|

[2022-06-22 22:17:27,431] INFO - agent | Starting KubernetesAgent with labels []
[2022-06-22 22:17:27,431] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2022-06-22 22:17:27,431] INFO - agent | Waiting for flow runs...
[2022-06-22 22:17:27,433] DEBUG - agent | Sending agent heartbeat...
[2022-06-22 22:17:27,491] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
[2022-06-22 22:17:27,492] DEBUG - agent | Running thread pool with 6 workers to handle flow deployment
[2022-06-22 22:17:27,493] DEBUG - agent | Querying for ready flow runs...
[2022-06-22 22:17:27,496] DEBUG - agent | Agent API server listening on port <>
[2022-06-22 22:17:27,591] DEBUG - agent | Heartbeat succesful! Sleeping for 60.0 seconds...
[2022-06-22 22:17:27,592] DEBUG - agent | No ready flow runs found.
[2022-06-22 22:17:27,593] DEBUG - agent | Sleeping flow run poller for 0.5 seconds...
Can you try adding
to the agent start maybe?
I tried to do that but it didn't work, seems like that flag is only for a local agent not k8s
Ah that makes sense. It would be hard for k8s. But yeah you need to catch the logs of the Flow pod here, not the agent pod, I don’t think that will be helpful
yeah that seems right and it's also what I've been struggling with 🙂 Do you have any ideas/tips for creating/getting the logs from the flow pods in k8s?
anything in the flow itself I could add? or maybe in the flow image?
This should be right, but you gotta do it while the pod is alive
thanks I'll give it a try
@Kevin Kho, fyi - the problem turned out to be the custom image we were using for our job template
thanks so much for helping track this down!
Thanks but I didn’t do anything lol