Hi, I'm new to Prefect and installed a local serve...
# prefect-community
b
Hi, I'm new to Prefect and installed a local server. Last week everything seemed to work and I started to write my first flows. They were executed as expected, also the UI worked great. However today morning, I tried to continue working on my flows, but for some reason the UI was frozen and I just restarted the containers through docker-compose (down and up again). After that I experienced a curious / weird output in the Postgres logfile (I will explain below). Even the UI was working fine again, the logfile of Postgres was quite confusing. So I thought, that something has messed up with my agent or other communication issues between Prefect server and the database. I decided to make a new clean setup in a new clean virtual environment. I also setup a path to a new database, and deleted the .prefect folder in the home directory. But even with the clean setup the weird behavior of Postgres started right away: ==> More details in thread
k
Hi @Bernd Grolig, will read through this but could we move more of the details to the thread to not dominate the main channel? You could do something like “more details in thread” and then paste the details here
1
What version were you on before upgrading and how are you starting it?
prefect server start
?
1
b
What's happening: After starting with docker-compose, the ASCII Art shows up, after that there are some normal logs from Apollo and Graphql and then Postgres prints an error: (I'm sorry for the misformatted layout, but that's the way it was piped to my logfile)
[36mapollo_1    |[0m > @ serve /apollo
[36mapollo_1    |[0m > node dist/index.js
[36mapollo_1    |[0m
[36mapollo_1    |[0m Building schema...
[33mgraphql_1   |[0m INFO:     172.27.0.5:36484 - "POST /graphql/ HTTP/1.1" 200 OK
[36mapollo_1    |[0m Building schema complete!
[36mapollo_1    |[0m Server ready at <http://0.0.0.0:4200> :rocket: (version: 2022.01.25)
[36mapollo_1    |[0m Sending telemetry to Prefect Technologies, Inc.: {"source":"prefect_server","type":"startup","payload":{"id":"c22d6050-de0c-4f88-b0eb-bec859f43ab6","prefect_server_version":"2022.01.25","api_version":"0.2.0"}}
[33mgraphql_1   |[0m INFO:     172.27.0.5:36512 - "POST /graphql/ HTTP/1.1" 200 OK
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] ERROR:  insert or update on table "log" violates foreign key constraint "log_flow_run_id_fkey"
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] DETAIL:  Key (flow_run_id)=(97484f4c-5c8c-4aa0-bac1-b31dcac0a9a9) is not present in table "flow_run".
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] STATEMENT:  WITH "log__mutation_result_alias" AS (INSERT INTO "public"."log" ( "created", "tenant_id", "is_loaded_from_archive", "task_run_id", "name", "id", "updated", "message", "timestamp", "level", "flow_run_id", "info" ) VALUES (DEFAULT, (NULL)::uuid, DEFAULT, (NULL)::uuid, ('prefect.subprocess_heartbeat')::varchar, ('76e218e5-71b2-4622-983c-7fa452b303ff')::uuid, DEFAULT, ('Failed to send heartbeat with exception: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host=''localhost'', port=4200): Read timed out. (read timeout=15)"))
And after that statement it continues to spam the log with errors that are somehow connected, but won't stop. So this continues basically forever. Within 5 minutes the logfile has already 1.6 million lines! This is the error message
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] ERROR:  insert or update on table "log" violates foreign key constraint "log_flow_run_id_fkey"
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] DETAIL:  Key (flow_run_id)=(97484f4c-5c8c-4aa0-bac1-b31dcac0a9a9) is not present in table "flow_run".
[35mpostgres_1  |[0m 2022-04-04 14:14:24.586 UTC [41] STATEMENT:  WITH "log__mutation_result_alias" AS (INSERT INTO "public"."log" ( "created", "tenant_id", "is_loaded_from_archive", "task_run_id", "name", "id", "updated", "message", "timestamp", "level", "flow_run_id", "info" ) VALUES (DEFAULT, (NULL)::uuid, DEFAULT, (NULL)::uuid, ('prefect.subprocess_heartbeat')::varchar, ('76e218e5-71b2-4622-983c-7fa452b303ff')::uuid, DEFAULT, ('Failed to send heartbeat with exception: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host=''localhost'', port=4200): Read timed out. (read timeout=15)"))
[35mpostgres_1  |[0m 	  File "<string>", line 3, in raise_from
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
[35mpostgres_1  |[0m 	    httplib_response = conn.getresponse()
[35mpostgres_1  |[0m 	  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
[35mpostgres_1  |[0m 	    response.begin()
[35mpostgres_1  |[0m 	  File "/usr/lib/python3.8/http/client.py", line 316, in begin
[35mpostgres_1  |[0m 	    version, status, reason = self._read_status()
[35mpostgres_1  |[0m 	  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
[35mpostgres_1  |[0m 	    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
[35mpostgres_1  |[0m 	  File "/usr/lib/python3.8/socket.py", line 669, in readinto
[35mpostgres_1  |[0m 	    return self._sock.recv_into(b)
[35mpostgres_1  |[0m 	socket.timeout: timed out
[35mpostgres_1  |[0m
[35mpostgres_1  |[0m 	During handling of the above exception, another exception occurred:
[35mpostgres_1  |[0m
[35mpostgres_1  |[0m 	Traceback (most recent call last):
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/requests/adapters.py", line 440, in send
[35mpostgres_1  |[0m 	    resp = conn.urlopen(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
[35mpostgres_1  |[0m 	    retries = retries.increment(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
[35mpostgres_1  |[0m 	    raise six.reraise(type(error), error, _stacktrace)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
[35mpostgres_1  |[0m 	    raise value
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
[35mpostgres_1  |[0m 	    httplib_response = self._make_request(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 451, in _make_request
[35mpostgres_1  |[0m 	    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 340, in _raise_timeout
[35mpostgres_1  |[0m 	    raise ReadTimeoutError(
[35mpostgres_1  |[0m 	urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host=''localhost'', port=4200): Read timed out. (read timeout=15)
[35mpostgres_1  |[0m
[35mpostgres_1  |[0m 	During handling of the above exception, another exception occurred:
[35mpostgres_1  |[0m
[35mpostgres_1  |[0m 	Traceback (most recent call last):
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/cli/heartbeat.py", line 68, in flow_run
[35mpostgres_1  |[0m 	    client.update_flow_run_heartbeat(id)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 1288, in update_flow_run_heartbeat
[35mpostgres_1  |[0m 	    self.graphql(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 452, in graphql
[35mpostgres_1  |[0m 	    result = <http://self.post|self.post>(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 407, in post
[35mpostgres_1  |[0m 	    response = self._request(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 641, in _request
[35mpostgres_1  |[0m 	    response = self._send_request(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 506, in _send_request
[35mpostgres_1  |[0m 	    response = <http://session.post|session.post>(
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/requests/sessions.py", line 577, in post
[35mpostgres_1  |[0m 	    return self.request(''POST'', url, data=data, json=json, **kwargs)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/requests/sessions.py", line 529, in request
[35mpostgres_1  |[0m 	    resp = self.send(prep, **send_kwargs)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/requests/sessions.py", line 645, in send
[35mpostgres_1  |[0m 	    r = adapter.send(request, **kwargs)
[35mpostgres_1  |[0m 	  File "/var/opt/datawarehouse/prefect/.venv/lib/python3.8/site-packages/requests/adapters.py", line 532, in send
[35mpostgres_1  |[0m 	    raise ReadTimeout(e, request=request)
[35mpostgres_1  |[0m 	requests.exceptions.ReadTimeout: HTTPConnectionPool(host=''localhost'', port=4200): Read timed out. (read timeout=15)')::varchar, ('2022-04-01T09:16:17.761276Z')::timestamptz, ('ERROR')::varchar, ('97484f4c-5c8c-4aa0-bac1-b31dcac0a9a9')::uuid, (NULL)::jsonb), (DEFAULT, (NULL)::uuid, DEFAULT, (NULL)::uuid, ('prefect.subprocess_heartbeat')::varchar, ('cc83da54-2f81-4ea3-afcd-506a6af4b644')::uuid, DEFAULT, ('Failed to send heartbeat with exception: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host=''localhost'', port=4200): Read timed out. (read timeout=15)"))
[35mpostgres_1  |[0m 	Traceback (most recent call last):
... to be continued rapidly ... (1.6 mio lines in some minutes) Some basics to my setup: • I'm runnng Prefect server on a VPS with Ubuntu 20.04, 6 cores, 16GB RAM. • I have a NGINX Reverse Proxy to access my containers. • I also updated to the latest version 1.1 today • Prefect Local Server and all the docker containers running on the same machine • The Apollo Server is exposed to localhost, the UI is mapped via NGINX and accessible through an subdomain. As I mentioned the system worked wonderful last week. Maybe the problem with the logs was already present, but I was not aware, as I didn't saw the output on the terminal screen. By analyzing the first Postgres Error it seems, that Postgres wants to access a table log and gets the error: ==>
"ERROR:  insert or update on table "log" violates foreign key constraint "log_flow_run_id_fkey"
The detail message shows:
DETAIL:  Key (flow_run_id)=(97484f4c-5c8c-4aa0-bac1-b31dcac0a9a9) is not present in table "flow_run".
So this would mean, that a certain flow can not be updated as it is not present. As I made a clean installation with a new database, I would not expect that there is any flow id in the database... Would be great if somebody could help me on this issue. It seems that something is messed up, but I just don't know what.... Thanks everybody!
I was on 1.0.0 before and I started through a local docker-compose.yml file that was exported with prefect server command. The clean installation 1.1.0 was started with prefect server start command
k
Just confirming clean installation failed? That was without trying to use the database from earlier?
b
Yes. After I detected the error, I set up everything new. Of course I saved the old database folder somewhere. But for the test, everything was in clean new virtual environment with a new database
k
Not sure if related but what is your hasura version?
b
I use this image: hasura/graphql-engine:v2.1.1
k
That looks all good. I am confused why there is some request still happening. Do you by chance have an agent or flow run that is still polling the server when you do the start, which is why it’s looking for a flow run id?
b
I will try and report in a second
Oh, sorry, just realized that I misunderstood your question.
I just registered a new Hello World flow and connected a new agent
How can I check if there are more agents running? The UI showed only one setup by myself
k
Just confirming, you started the server, and then registered, and then connected the agent, and then you got all of those log table errors? Did you run the flow? I think the agents tab is a good check. I was more of asking if you may have some flow run that is still running and it’s hitting the API, and then you restart server and the database, but that flow run is still hitting the API to log stuff, and how the API can’t find that flow in the database. Does that make sense to you?
b
Copy code
I was more of asking if you may have some flow run that is still running and it's hitting the API, and then you restart server and the database, but that flow run is still hitting the API to log stuff, and how the API can't find that flow in the database. Does that make sense to you?
I'm pretty sure, that I stopped all flows within the UI on Friday, because, they are still buggy and I did not like that they spam the system over the weekend.
From the local prefect: How would I check if a flow is still running?
Copy code
Just confirming, you started the server, and then registered, and then connected the agent, and then you got all of those log table errors? Did you run the flow?
Not really: I get the errors immediately when I start the
prefect server start
After the containers are up, and the ASCII art appears, Postgres starts to throw these errors rapdily
The funny thing is, that the system is still up. That's also the reason, why I'm not sure, if the error occurred last week, and I just didn't realize it, because everything was suppressed
k
Ah I see. We wouldn’t have an easy way on our side to check if a Flow is still running. What agent were you on? If Local, killing the local agent will kill the flow runs
b
Ok, than it is pretty sure, that all flows are canceled: Last week I ran the local agent with supervisord and i quitted the agent this morning
In case I would make again a clean setup, what steps do I have to do in order to clean everything from the server and install from scratch?
k
i think
prefect server start
should be clean because it’s containers. It would be more like clearing the docker cache
a
@Bernd Grolig if you are new to Prefect, perhaps it makes sense to start directly with Prefect 2.0? check out this thread for more info
b
... I was busy the last days to investigate further and found some time today: So I searched for active processes with the name "prefect" on localhost and got this output. What does it mean? To me it looks like there are some heartbeat checks from registered flows that query the server constantly. As the flows were not found on the server they delivered permanently error messages over and over again. I also checked the size of my old database folder which grew to 100G! If I am right, I guess I have to kill all these processes and start over again. However I have still some questions about how things work together: If I run a scheduled flow on the local host that is also registered in the server: Will this initiate a process on the local machine that constantly checks for the registered flow on the server? As soon as the flow is deleted via UI on the server the process on localhost will still query for that on the database and gets constantly errors. If this understanding is correct. I would suggest to call this a bug, as the UI says ok, I deleted this flow but in the end it is still active on the local machine. Am I right here? Or is there another interpretation of the behavior?
@Kevin Kho Just mentioned you, that you are aware of my answer. I appreciate you're opinion on this (see above message) Thanks 🙂
@Anna Geller Thanks for your hint on Orion. I will check this. 🙂
k
Yes I guess you have to kill them. The heartbeat is a process that signals the flow is alive. Without them, we wouldn’t be able to tell if a flow crashed and it would permanently be shown as running in the UI. It is weird that your heartbeats persist, and the first time I have seen that behavior. These are still running right?
b
So, short feedback: Finally I managed to stop the requests.
I'm not still not sure, about the root cause. What I tested today: • I restarted Prefect and constant logging appeared immediately as shown on the screenshot • I killed those "hanging" heartbeat processes, but that had no influence to logging. Postgres was still heavily reporting these error messages • After that I created a local Hello World flow, just to check if this is working (which it did) • Postgres was still heavily logging errors • Next: I spin up the local agent and next time I checked logging suddenly stopped.
So, at the end it's not quite clear what got messed up, but finally it seems back to normal. I will keep an eye on those "forgotten" heartbeat processes the next time. @Kevin Kho Thanks a lot for your support.
k
Hard to tell what is going on. The heartbeat logs are not normal. When you say logging stoped, is that Flow log or heartbeart logs? I assume heartbeat?
b
I mean those Error logs from Postgres Database. Logging comments in the flow seems to be running as expected
k
Ah gotcha. Glad you got it back to normal
b
Just to really understand the concept of how things work together: The local prefect I installed by "Pip install prefect" would perfectly run without any server or ui on a local machine, right?
k
Yes that is referred to Prefect Core but Core interacts with other services that form the backend (db, api, etc.), but core has some functionality to run on it’s own
b
Yes, I got that now. It was just a little confusing with local prefect (core), client, agent, server, ui and graphql. But I think I learned a lot now by diving deeper thanks to the errors.
😅 1
303 Views