jeff n
02/08/2021, 11:28 PM@task
settings, and just sits waiting for an hour until the heartbeat dies. We can run it on the same worker server manually with no problems at all. There doesn’t seem to be any logs indicating it is out of memory. This seems pretty squarely on the prefect side but I am lost how to debug it since only the server has the issue.
Process 27956: /opt/prefect/.venv/bin/python3.8 -m prefect heartbeat flow-run -i 14303ee3-e96a-49c9-87e6-6577fcace7d3
Python v3.8.0 (/usr/bin/python3.8)
Thread 27956 (idle): "MainThread"
flow_run (prefect/cli/heartbeat.py:98)
invoke (click/core.py:610)
invoke (click/core.py:1066)
invoke (click/core.py:1259)
invoke (click/core.py:1259)
main (click/core.py:782)
__call__ (click/core.py:829)
<module> (prefect/__main__.py:4)
_run_code (runpy.py:85)
_run_module_as_main (runpy.py:192)
Chris White
flow.run
) and a server managed run that hopefully provide a place to begin looking:
• firstly, a server managed run will be run by an agent. If you are using a LocalAgent
, then that means the flow run will take place in a subprocess managed by the agent
• secondly, a server managed run will additionally spawn its own subprocess to send regular heartbeats to the server API (which is what the process ID you’ve listed here looks to be - that CLI command does very little other than send a POST request and sleep: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/heartbeat.py#L91-L98)
It’s really hard to say for certain, but my guess would be a resource limitation of some kind caused by these subprocesses, or possibly some locked state caused by that particular task running in a subprocess as opposed to the main process. We’ve seen instances of deployments that don’t include a process cleanup service (e.g., tini https://github.com/krallin/tini) that can quickly result in a clogged deployment, so I’d start by investigating that.jeff n
02/08/2021, 11:38 PMFeb 08 23:25:25 ip-10-3-14-187 prefect[7845]: graphql_1 | INFO: 172.18.0.3:57846 - "POST /graphql/ HTTP/1.1" 200 OK
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | PayloadTooLargeError: request entity too large
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at readStream (/apollo/node_modules/raw-body/index.js:155:17)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at getRawBody (/apollo/node_modules/raw-body/index.js:108:12)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at read (/apollo/node_modules/body-parser/lib/read.js:77:3)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at jsonParser (/apollo/node_modules/body-parser/lib/types/json.js:135:5)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at Layer.handle [as handle_request] (/apollo/node_modules/express/lib/router/layer.js:95:5)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at trim_prefix (/apollo/node_modules/express/lib/router/index.js:317:13)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at /apollo/node_modules/express/lib/router/index.js:284:7
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at Function.process_params (/apollo/node_modules/express/lib/router/index.js:335:12)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at next (/apollo/node_modules/express/lib/router/index.js:275:10)
Feb 08 23:25:29 ip-10-3-14-187 prefect[7845]: apollo_1 | at cors (/apollo/node_modules/cors/lib/index.js:188:7)
Chris White
PrefectResult
instead of a LocalResult
)
The limit in Server is currently configured to 1mb (https://github.com/PrefectHQ/server/blob/master/services/apollo/src/index.js#L103)jeff n
02/09/2021, 1:41 AMChris White
jeff n
02/09/2021, 4:14 AM