Hi! I'm running prefect server and my flows keep g...
# ask-community
d
Hi! I'm running prefect server and my flows keep getting stuck on
submitted
. I originally started with a kubernetes agent and thought it was a configuration issue between prefect and kubernetes, but I get the same behavior with a docker agent. The agents pull the correct docker image and submit it, but the flows don't enter a running state. I'd be grateful for any ideas or pointers! Cheers
a
Hi @deltikron. Does the flow and agent have the same labels? Agent picks up flows from server only if the flow has at least one label matching the agent labels
d
Hey @ale,thanks for the quick reply! Yes, the labels match, I can see the agent picking up the run in the agent log:
Copy code
[2020-11-09 09:53:40,686] DEBUG - agent | Querying for flow runs
[2020-11-09 09:53:40,705] DEBUG - agent | No flow runs found
[2020-11-09 09:53:40,705] DEBUG - agent | Next query for flow runs in 1.0 seconds
[2020-11-09 09:53:41,346] DEBUG - agent | {'status': 'Pulling from deltikron/docker-storage', 'id': '2020-11-09t09-31-50-846671-00-00'}
[2020-11-09 09:53:41,347] DEBUG - agent | {'status': 'Digest: sha256:1dfb55881e8f8237ccecfba2a8e574911e072b40d9e49118b2f373aaf000604c'}
[2020-11-09 09:53:41,347] DEBUG - agent | {'status': 'Status: Image is up to date for deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00'}
[2020-11-09 09:53:41,347] INFO - agent | Successfully pulled image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
[2020-11-09 09:53:41,347] DEBUG - agent | Creating Docker container deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00
[2020-11-09 09:53:41,379] DEBUG - agent | Starting Docker container with ID 37ef363b100e3a1524fda30af683bf6eb5c68fae3e1adc8316d55ffafe4ccd39
[2020-11-09 09:53:41,706] DEBUG - agent | Querying for flow runs
[2020-11-09 09:53:41,727] DEBUG - agent | No flow runs found
[2020-11-09 09:53:41,727] DEBUG - agent | Next query for flow runs in 2.0 seconds
[2020-11-09 09:53:41,771] DEBUG - agent | Docker container 37ef363b100e3a1524fda30af683bf6eb5c68fae3e1adc8316d55ffafe4ccd39 started
[2020-11-09 09:53:41,785] DEBUG - agent | Completed flow run submission (id: ef9ac46e-9f6c-4722-87aa-ff3830ff15f3)
[2020-11-09 09:53:43,727] DEBUG - agent | Querying for flow runs
The flows just stay in the submitted state until they get failed automatically after a while:
a
Mmmh…very strange… It seems that Prefect server is not able to check if the flow is still alive or not. Can you confirm that once the flow is submitted the corresponding container is started?
d
Aaaahh, that approach helped! Not solved yet but closer to what's happening: I ran the DockerAgent with the -f flag for flow output and got the following error message:
Copy code
____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                           |___/

[2020-11-10 08:57:28,847] INFO - agent | Starting DockerAgent with labels ['docker']
[2020-11-10 08:57:28,847] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2020-11-10 08:57:28,847] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>
[2020-11-10 08:57:28,854] INFO - agent | Waiting for flow runs...
[2020-11-10 08:58:04,539] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-11-10 08:58:04,573] INFO - agent | Deploying flow run 386cf83f-284f-4f7f-99d6-7369e4abd685
[2020-11-10 08:58:04,574] INFO - agent | Pulling image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
[2020-11-10 08:58:05,869] INFO - agent | Successfully pulled image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1277, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1323, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1272, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1032, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 972, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 172, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 34, in flow_run
    return _execute_flow_run()
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 66, in _execute_flow_run
    result = client.graphql(query)
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 281, in graphql
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 237, in post
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 401, in _request
    session=session, method=method, url=url, params=params, headers=headers
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 319, in _send_request
    response = <http://session.post|session.post>(url, headers=headers, json=params, timeout=30)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host'))
Is this more of a docker error than a prefect error? I'm not 100% sure how to continue with it, but at least I've got a new lead for now.
a
I think it’s a Docker error. But since the container is not running, then Lazarus is not able to check wether the flow run is alive or not
d
🤔 All right, that's strange: Just checked all of my versions and so on, both prefect and docker are perfectly up to date (prefect v.0.13.12 and docker v.19.03.13) I'm also pretty sure that I haven't done any exotic configuration of the network on my host since it's just a dedicated development VM that has nothing else on it. In any case this couldn't be the cause of the problem as the error is occurring within the container. I have to admit I'm a bit lost... I'm only trying to run the DockerStorage sample flow from the documentation which should just work, right? Edit: Link formatting
a
From the error log it seems that the Docker container is not able to connect to GraphQL…
d
Yes, you were right, I just had a short session with our admin and there was in fact a firewall inbetween the container and graphql, opening that up fixed the issue.
Thanks for your time!
In case anyone stumbles across this thread in the future, there is even an open issue for this problem: https://github.com/PrefectHQ/server/issues/25 If I'd found it earlier that would have saved a lot of time 🙈
👍 1
r
Hello @deltikron @ale - I'm encountering the same issue too! I'm not much of a networking guru, but I understand that the fix would be to open up the firewall inbetween the container and graphql. How do I go about doing that?
d
Hey @Riley Hun I'm afraid I can't really help you there, sorry. I'm way out of my depth with the intricacies of network config. What I did was to completely turn off the
firewalld
service because I was only working on a local development machine. In a production-type setting this is probably way to dangerous and once we get past the proof-of-concept stage we'll be fine tuning this with the help of our experts.