Thread
#prefect-server
    a

    Alex Furrier

    1 year ago
    On a K8s deployed Prefect server, the graphql pod is stuck in a CrashLoopBackoff. Not sure why it exited as Completed?
    I deleted the pod and when a new one was spun up it was succesful. Not sure what caused this issue though.
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Alex Furrier, from experience Server startup does hiccup sometimes and if it’s successful from the restart you should be fine. It’s weird also it seems good from these lgos. Could you move the logs to the thread when you get a chance so we don’t crowd the main channel?
    a

    Alex Furrier

    1 year ago
    These are the pod logs:
    {"severity": "INFO", "name": "prefect-server.GraphQL Server", "message": "Using uvicorn log level = 'debug'"}
    INFO:     Started server process [1]
    INFO:     Waiting for application startup.
    INFO:     Application startup complete.
    INFO:     Uvicorn running on <http://0.0.0.0:4201> (Press CTRL+C to quit)
    INFO:     10.244.19.1:55684 - "GET /health HTTP/1.1" 200 OK
    INFO:     10.244.19.1:55692 - "GET /health HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41468 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41482 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41144 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41504 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41508 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41356 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41514 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41526 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41592 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41590 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41380 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41236 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41596 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41378 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41234 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41600 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41238 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41688 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41694 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41700 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41704 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41702 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41706 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41712 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41318 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41320 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41718 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41720 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41722 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     Shutting down
    INFO:     Waiting for connections to close. (CTRL+C to force quit)
    INFO:     10.244.16.156:41862 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41012 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41014 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41868 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41886 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:40986 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41072 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41878 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41010 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:40958 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41724 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:42090 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     10.244.16.156:41726 - "POST /graphql/ HTTP/1.1" 200 OK
    INFO:     Waiting for application shutdown.
    INFO:     Application shutdown complete.
    INFO:     Finished server process [1]
    And this is the last termination message:
    lastState:
            terminated:
              containerID: <containerd://f24f7214318a9605d579ea96b19b5fdbe8ba829ecbf1f4849d4ff0bd3aca5a2>7
              exitCode: 0
              finishedAt: "2021-08-19T23:08:36Z"
              reason: Completed
              startedAt: "2021-08-19T23:07:40Z"
    Kevin Kho

    Kevin Kho

    1 year ago
    Thank you!
    a

    Alex Furrier

    1 year ago
    Not sure if it's relevant but there were 2 flows running at the time the graphql went down. After restart 1 failed and 1 kept running. Probably some type of retry differences on the tasks running but maybe related to what caused it to crash in the first place.
    Kevin Kho

    Kevin Kho

    1 year ago
    Oh I thought this was on spinup. That’s weird. I guess bring it up if you see it again? A bit hard to tell what happened.
    a

    Alex Furrier

    1 year ago
    Could it be related to large log messages? I turned on some logging for debugging purposes that involved fairly large amount of text. Could the large logs somehow be causing crashes?
    Kevin Kho

    Kevin Kho

    1 year ago
    You would get an API error saying that it was rejected because the entity request was too large, which we have some of lately
    s

    Sam Cook

    1 year ago
    All of the Prefect pods have to come up in a very specific order or else they fail into unrecoverable states. The docker based deployment handles this with depends_on statements in the docker-compose, but there's not a analogous construct on k8s.https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/docker-compose.yml The best you can do to get things in the right order is add init containers to the config that loop until the required services are up
    a

    Alex Furrier

    1 year ago
    @Sam Cook The issue doesn't appear to happen on container init but in the middle of a flow run. More on it in the other thread just below https://prefect-community.slack.com/archives/C014Z8DPDSR/p1629471848376900?thread_ts=1629432352.371900&amp;cid=C014Z8DPDSR