d

    Devin Flake

    3 months ago
    Hi, I've got a problem running flows from Prefect Server in Kubernetes. It was working at one point but now the flows gets stuck in a
    Submitted for execution
    state. I found these docs and tried debugging/restarting services with them but no luck:
    <https://docs.prefect.io/orchestration/faq/debug.html#my-flow-is-stuck-in-a-submitted-state>
    <https://discourse.prefect.io/t/why-is-my-flow-stuck-in-a-submitted-state/201>
    I also added these flags to the kubernetes prefect agent:
    --log-level DEBUG --disable-job-deletion
    That gave me more detail but still no indication of what the problem is. Any help would be appreciated, thanks!
    This is the log from the agent after submitting a flow:
    [2022-06-22 21:00:01,562] DEBUG - agent | Querying for ready flow runs...
    DEBUG:agent:Found 1 ready flow run(s): {'26c959b4-9edd-47c9-98e7-e3997f2dc186'}
    [2022-06-22 21:00:01,601] DEBUG - agent | Found 1 ready flow run(s): {'26c959b4-9edd-47c9-98e7-e3997f2dc186'}
    [2022-06-22 21:00:01,601] DEBUG - agent | Retrieving metadata for 1 flow run(s)...
    DEBUG:agent:Retrieving metadata for 1 flow run(s)...
    DEBUG:agent:Submitting flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 for deployment...
    [2022-06-22 21:00:01,629] DEBUG - agent | Submitting flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 for deployment...
    [2022-06-22 21:00:01,629] DEBUG - agent | Sleeping flow run poller for 0.25 seconds...
    DEBUG:agent:Sleeping flow run poller for 0.25 seconds...
    [2022-06-22 21:00:01,631] INFO - agent | Deploying flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 to execution environment...
    INFO:agent:Deploying flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 to execution environment...
    [2022-06-22 21:00:01,631] DEBUG - agent | Updating flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 state from Scheduled -> Submitted...
    DEBUG:agent:Updating flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186 state from Scheduled -> Submitted...
    DEBUG:agent:Creating namespaced job prefect-job-c7b44f14
    [2022-06-22 21:00:01,815] DEBUG - agent | Creating namespaced job prefect-job-c7b44f14
    DEBUG:agent:Job prefect-job-c7b44f14 created
    INFO:agent:Completed deployment of flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186
    [2022-06-22 21:00:01,851] DEBUG - agent | Job prefect-job-c7b44f14 created
    [2022-06-22 21:00:01,851] INFO - agent | Completed deployment of flow run 26c959b4-9edd-47c9-98e7-e3997f2dc186
    [2022-06-22 21:00:01,892] DEBUG - agent | Querying for ready flow runs...
    reading that log - it almost seems like the flow is working correctly and completes but it never gets 'marked' as complete in the UI?
    Completed deployment of flow
    Kevin Kho

    Kevin Kho

    3 months ago
    Wondering if it’s pointing at the right endpoint thne
    d

    Devin Flake

    3 months ago
    which endpoint should it be pointing at?
    this is from the agent startup:
    dflake@dflake-thinkpad:~/$ kubectl logs arte-prefect-agent-7d94bfd6b4-tsxql -n arte-prefect 
    
    [2022-06-22 20:57:48,996] DEBUG - agent | Environment variables: []
    [2022-06-22 20:57:48,996] DEBUG - agent | Max polls: None
    [2022-06-22 20:57:48,996] DEBUG - agent | Agent address: <http://0.0.0.0:8080>
    [2022-06-22 20:57:48,996] DEBUG - agent | Log to Cloud: True
    [2022-06-22 20:57:48,996] DEBUG - agent | Prefect backend: server
    [2022-06-22 20:57:48,998] DEBUG - agent | Namespace: arte-prefect
    [2022-06-22 20:57:48,998] INFO - agent | Registering agent...
    [2022-06-22 20:57:49,213] INFO - agent | Registration successful!
    [2022-06-22 20:57:49,213] DEBUG - agent | Assigned agent id: c419c65a-a229-49eb-91ee-6b17b22ac252
    [2022-06-22 20:57:49,213] DEBUG - agent | Sending test query to API at '<http://arte-prefect-apollo.arte-prefect:4200/graphql>'...
    [2022-06-22 20:57:49,227] DEBUG - agent | Test query successful!
    
     ____            __           _        _                    _
    |  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
    | |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
    |  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
    |_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                               |___/
    
    [2022-06-22 20:57:49,227] INFO - agent | Starting KubernetesAgent with labels []
    [2022-06-22 20:57:49,227] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
    [2022-06-22 20:57:49,227] INFO - agent | Waiting for flow runs...
    [2022-06-22 20:57:49,227] DEBUG - agent | Sending agent heartbeat...
    [2022-06-22 20:57:49,231] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
    [2022-06-22 20:57:49,290] DEBUG - agent | Running thread pool with 6 workers to handle flow deployment
    [2022-06-22 20:57:49,290] DEBUG - agent | Querying for ready flow runs...
    [2022-06-22 20:57:49,294] DEBUG - agent | Agent API server listening on port <http://0.0.0.0:8080>
    [2022-06-22 20:57:49,391] DEBUG - agent | Heartbeat succesful! Sleeping for 60.0 seconds...
    [2022-06-22 20:57:49,393] DEBUG - agent | No ready flow runs found.
    [2022-06-22 20:57:49,393] DEBUG - agent | Sleeping flow run poller for 0.5 seconds...
    Kevin Kho

    Kevin Kho

    3 months ago
    I believe the agent is right since it picked up the Flow. But the Flow must have an endpoint configured also right? Is it?
    d

    Devin Flake

    3 months ago
    I'm not sure, is that defined in the flow itself? It's a pretty basic hello_world flow that I'm trying to run:
    from prefect import Flow, task
    from prefect.storage import Azure
    from prefect.run_configs import KubernetesRun
    
    
    FLOW_NAME = "azure_k8s"
    STORAGE = Azure(
            container="arte-prefect",
            connection_string_secret="AZURE_STORAGE_CONNECTION_STRING"
    )
    
    @task(log_stdout=True)
    def hello_world():
        text = f"hello from {FLOW_NAME}"
        print(text)
        return text
    
    
    with Flow(
        FLOW_NAME, storage=STORAGE, run_config=KubernetesRun(job_template_path="/root/arte-tasks/k8s/prefect-job-template.yaml"),
    ) as flow:
        hw = hello_world()
    Kevin Kho

    Kevin Kho

    3 months ago
    More like in the image or with an env var
    PREFECT__SERVER__ENDPOINT=...
    d

    Devin Flake

    3 months ago
    ah - checking
    I added in the
    PREFECT__SERVER__ENDPOINT
    now the k8s agent looks like this:
    spec:
          containers:
          - command:
            - bash
            - -c
            - prefect agent kubernetes start --log-level DEBUG --disable-job-deletion
            env:
            - name: PREFECT__CLOUD__API
              value: <http://arte-prefect-apollo.arte-prefect:4200/graphql>
            - name: PREFECT__SERVER__ENDPOINT
              value: <https://prefect-api.arte.adobe.net/graphql>
            - name: NAMESPACE
              value: arte-prefect
            - name: IMAGE_PULL_SECRETS
            - name: PREFECT__CLOUD__AGENT__LABELS
              value: '[]'
            - name: JOB_MEM_REQUEST
            - name: JOB_MEM_LIMIT
            - name: JOB_CPU_REQUEST
            - name: JOB_CPU_LIMIT
            - name: IMAGE_PULL_POLICY
            - name: SERVICE_ACCOUNT_NAME
              value: arte-prefect-serviceaccount
            - name: PREFECT__BACKEND
              value: server
            - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
              value: <http://0.0.0.0:8080>
            image: prefecthq/prefect:1.1.0-python3.8
    running another flow still has no luck though 😞
    Kevin Kho

    Kevin Kho

    3 months ago
    That looks pretty simple already. Do you have access to pod logs?
    d

    Devin Flake

    3 months ago
    yes - I'm not seeing any change from before in the agent logs:
    dflake@dflake-thinkpad:~/adobe/git_repos/arte-tasks$ kubectl logs pod/arte-prefect-agent-5dcd9b5df9-pmkcq -n arte-prefect 
    
    [2022-06-22 22:17:27,198] DEBUG - agent | Environment variables: []
    [2022-06-22 22:17:27,199] DEBUG - agent | Max polls: None
    [2022-06-22 22:17:27,199] DEBUG - agent | Agent address: <http://0.0.0.0:8080>
    [2022-06-22 22:17:27,199] DEBUG - agent | Log to Cloud: True
    [2022-06-22 22:17:27,199] DEBUG - agent | Prefect backend: server
    [2022-06-22 22:17:27,291] DEBUG - agent | Namespace: arte-prefect
    [2022-06-22 22:17:27,292] INFO - agent | Registering agent...
    [2022-06-22 22:17:27,416] INFO - agent | Registration successful!
    [2022-06-22 22:17:27,416] DEBUG - agent | Assigned agent id: c419c65a-a229-49eb-91ee-6b17b22ac252
    [2022-06-22 22:17:27,417] DEBUG - agent | Sending test query to API at '<http://arte-prefect-apollo.arte-prefect:4200/graphql>'...
    [2022-06-22 22:17:27,431] DEBUG - agent | Test query successful!
    
     ____            __           _        _                    _
    |  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
    | |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
    |  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
    |_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                               |___/
    
    [2022-06-22 22:17:27,431] INFO - agent | Starting KubernetesAgent with labels []
    [2022-06-22 22:17:27,431] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
    [2022-06-22 22:17:27,431] INFO - agent | Waiting for flow runs...
    [2022-06-22 22:17:27,433] DEBUG - agent | Sending agent heartbeat...
    [2022-06-22 22:17:27,491] DEBUG - agent | Retrieving information of jobs that are currently in the cluster...
    [2022-06-22 22:17:27,492] DEBUG - agent | Running thread pool with 6 workers to handle flow deployment
    [2022-06-22 22:17:27,493] DEBUG - agent | Querying for ready flow runs...
    [2022-06-22 22:17:27,496] DEBUG - agent | Agent API server listening on port <http://0.0.0.0:8080>
    [2022-06-22 22:17:27,591] DEBUG - agent | Heartbeat succesful! Sleeping for 60.0 seconds...
    [2022-06-22 22:17:27,592] DEBUG - agent | No ready flow runs found.
    [2022-06-22 22:17:27,593] DEBUG - agent | Sleeping flow run poller for 0.5 seconds...
    Kevin Kho

    Kevin Kho

    3 months ago
    Can you try adding
    --show-flow-logs
    to the agent start maybe?
    d

    Devin Flake

    3 months ago
    I tried to do that but it didn't work, seems like that flag is only for a local agent not k8s
    Kevin Kho

    Kevin Kho

    3 months ago
    Ah that makes sense. It would be hard for k8s. But yeah you need to catch the logs of the Flow pod here, not the agent pod, I don’t think that will be helpful
    d

    Devin Flake

    3 months ago
    yeah that seems right and it's also what I've been struggling with 🙂 Do you have any ideas/tips for creating/getting the logs from the flow pods in k8s?
    anything in the flow itself I could add? or maybe in the flow image?
    Kevin Kho

    Kevin Kho

    3 months ago
    This should be right, but you gotta do it while the pod is alive
    d

    Devin Flake

    3 months ago
    thanks I'll give it a try
    @Kevin Kho, fyi - the problem turned out to be the custom image we were using for our job template
    thanks so much for helping track this down!
    Kevin Kho

    Kevin Kho

    3 months ago
    Thanks but I didn’t do anything lol