On a K8s deployed Prefect server, the graphql pod ...
# prefect-server
a
On a K8s deployed Prefect server, the graphql pod is stuck in a CrashLoopBackoff. Not sure why it exited as Completed?
I deleted the pod and when a new one was spun up it was succesful. Not sure what caused this issue though.
k
Hey @Alex Furrier, from experience Server startup does hiccup sometimes and if it’s successful from the restart you should be fine. It’s weird also it seems good from these lgos. Could you move the logs to the thread when you get a chance so we don’t crowd the main channel?
👍 1
a
These are the pod logs:
Copy code
{"severity": "INFO", "name": "prefect-server.GraphQL Server", "message": "Using uvicorn log level = 'debug'"}
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on <http://0.0.0.0:4201> (Press CTRL+C to quit)
INFO:     10.244.19.1:55684 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.19.1:55692 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.16.156:41468 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41482 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41144 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41504 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41508 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41356 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41514 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41526 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41592 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41590 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41380 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41236 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41596 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41378 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41234 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41600 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41238 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41688 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41694 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41700 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41704 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41702 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41706 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41712 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41318 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41320 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41718 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41720 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41722 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     Waiting for connections to close. (CTRL+C to force quit)
INFO:     10.244.16.156:41862 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41012 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41014 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41868 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41886 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:40986 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41072 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41878 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41010 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:40958 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41724 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:42090 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     10.244.16.156:41726 - "POST /graphql/ HTTP/1.1" 200 OK
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]
And this is the last termination message:
Copy code
lastState:
        terminated:
          containerID: <containerd://f24f7214318a9605d579ea96b19b5fdbe8ba829ecbf1f4849d4ff0bd3aca5a2>7
          exitCode: 0
          finishedAt: "2021-08-19T23:08:36Z"
          reason: Completed
          startedAt: "2021-08-19T23:07:40Z"
k
Thank you!
a
Not sure if it's relevant but there were 2 flows running at the time the graphql went down. After restart 1 failed and 1 kept running. Probably some type of retry differences on the tasks running but maybe related to what caused it to crash in the first place.
k
Oh I thought this was on spinup. That’s weird. I guess bring it up if you see it again? A bit hard to tell what happened.
a
Could it be related to large log messages? I turned on some logging for debugging purposes that involved fairly large amount of text. Could the large logs somehow be causing crashes?
k
You would get an API error saying that it was rejected because the entity request was too large, which we have some of lately
s
All of the Prefect pods have to come up in a very specific order or else they fail into unrecoverable states. The docker based deployment handles this with depends_on statements in the docker-compose, but there's not a analogous construct on k8s. https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/docker-compose.yml The best you can do to get things in the right order is add init containers to the config that loop until the required services are up
👍 1
a
@Sam Cook The issue doesn't appear to happen on container init but in the middle of a flow run. More on it in the other thread just below https://prefect-community.slack.com/archives/C014Z8DPDSR/p1629471848376900?thread_ts=1629432352.371900&amp;cid=C014Z8DPDSR