Thread
#prefect-community
    deltikron

    deltikron

    1 year ago
    Hi! I'm running prefect server and my flows keep getting stuck on
    submitted
    . I originally started with a kubernetes agent and thought it was a configuration issue between prefect and kubernetes, but I get the same behavior with a docker agent. The agents pull the correct docker image and submit it, but the flows don't enter a running state. I'd be grateful for any ideas or pointers! Cheers
    ale

    ale

    1 year ago
    Hi @deltikron. Does the flow and agent have the same labels? Agent picks up flows from server only if the flow has at least one label matching the agent labels
    deltikron

    deltikron

    1 year ago
    Hey @ale,thanks for the quick reply! Yes, the labels match, I can see the agent picking up the run in the agent log:
    [2020-11-09 09:53:40,686] DEBUG - agent | Querying for flow runs
    [2020-11-09 09:53:40,705] DEBUG - agent | No flow runs found
    [2020-11-09 09:53:40,705] DEBUG - agent | Next query for flow runs in 1.0 seconds
    [2020-11-09 09:53:41,346] DEBUG - agent | {'status': 'Pulling from deltikron/docker-storage', 'id': '2020-11-09t09-31-50-846671-00-00'}
    [2020-11-09 09:53:41,347] DEBUG - agent | {'status': 'Digest: sha256:1dfb55881e8f8237ccecfba2a8e574911e072b40d9e49118b2f373aaf000604c'}
    [2020-11-09 09:53:41,347] DEBUG - agent | {'status': 'Status: Image is up to date for deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00'}
    [2020-11-09 09:53:41,347] INFO - agent | Successfully pulled image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
    [2020-11-09 09:53:41,347] DEBUG - agent | Creating Docker container deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00
    [2020-11-09 09:53:41,379] DEBUG - agent | Starting Docker container with ID 37ef363b100e3a1524fda30af683bf6eb5c68fae3e1adc8316d55ffafe4ccd39
    [2020-11-09 09:53:41,706] DEBUG - agent | Querying for flow runs
    [2020-11-09 09:53:41,727] DEBUG - agent | No flow runs found
    [2020-11-09 09:53:41,727] DEBUG - agent | Next query for flow runs in 2.0 seconds
    [2020-11-09 09:53:41,771] DEBUG - agent | Docker container 37ef363b100e3a1524fda30af683bf6eb5c68fae3e1adc8316d55ffafe4ccd39 started
    [2020-11-09 09:53:41,785] DEBUG - agent | Completed flow run submission (id: ef9ac46e-9f6c-4722-87aa-ff3830ff15f3)
    [2020-11-09 09:53:43,727] DEBUG - agent | Querying for flow runs
    The flows just stay in the submitted state until they get failed automatically after a while:
    ale

    ale

    1 year ago
    Mmmh…very strange… It seems that Prefect server is not able to check if the flow is still alive or not. Can you confirm that once the flow is submitted the corresponding container is started?
    deltikron

    deltikron

    1 year ago
    Aaaahh, that approach helped! Not solved yet but closer to what's happening: I ran the DockerAgent with the -f flag for flow output and got the following error message:
    ____            __           _        _                    _
    |  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
    | |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
    |  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
    |_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                               |___/
    
    [2020-11-10 08:57:28,847] INFO - agent | Starting DockerAgent with labels ['docker']
    [2020-11-10 08:57:28,847] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
    [2020-11-10 08:57:28,847] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>
    [2020-11-10 08:57:28,854] INFO - agent | Waiting for flow runs...
    [2020-11-10 08:58:04,539] INFO - agent | Found 1 flow run(s) to submit for execution.
    [2020-11-10 08:58:04,573] INFO - agent | Deploying flow run 386cf83f-284f-4f7f-99d6-7369e4abd685
    [2020-11-10 08:58:04,574] INFO - agent | Pulling image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
    [2020-11-10 08:58:05,869] INFO - agent | Successfully pulled image deltikron/docker-storage:2020-11-09t09-31-50-846671-00-00...
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
        (self._dns_host, self.port), self.timeout, **extra_kw
      File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
        raise err
      File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
        sock.connect(sa)
    OSError: [Errno 113] No route to host
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
        chunked=chunked,
      File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/usr/local/lib/python3.7/http/client.py", line 1277, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/usr/local/lib/python3.7/http/client.py", line 1323, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/usr/local/lib/python3.7/http/client.py", line 1272, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/usr/local/lib/python3.7/http/client.py", line 1032, in _send_output
        self.send(msg)
      File "/usr/local/lib/python3.7/http/client.py", line 972, in send
        self.connect()
      File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 187, in connect
        conn = self._new_conn()
      File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 172, in _new_conn
        self, "Failed to establish a new connection: %s" % e
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
        timeout=timeout
      File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
        method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
      File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 446, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host'))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/bin/prefect", line 8, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 34, in flow_run
        return _execute_flow_run()
      File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 66, in _execute_flow_run
        result = client.graphql(query)
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 281, in graphql
        retry_on_api_error=retry_on_api_error,
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 237, in post
        retry_on_api_error=retry_on_api_error,
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 401, in _request
        session=session, method=method, url=url, params=params, headers=headers
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 319, in _send_request
        response = <http://session.post|session.post>(url, headers=headers, json=params, timeout=30)
      File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
        return self.request('POST', url, data=data, json=json, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
        r = adapter.send(request, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24234bf710>: Failed to establish a new connection: [Errno 113] No route to host'))
    Is this more of a docker error than a prefect error? I'm not 100% sure how to continue with it, but at least I've got a new lead for now.
    ale

    ale

    1 year ago
    I think it’s a Docker error. But since the container is not running, then Lazarus is not able to check wether the flow run is alive or not
    deltikron

    deltikron

    1 year ago
    🤔 All right, that's strange: Just checked all of my versions and so on, both prefect and docker are perfectly up to date (prefect v.0.13.12 and docker v.19.03.13) I'm also pretty sure that I haven't done any exotic configuration of the network on my host since it's just a dedicated development VM that has nothing else on it. In any case this couldn't be the cause of the problem as the error is occurring within the container. I have to admit I'm a bit lost... I'm only trying to run the DockerStorage sample flow from the documentation which should just work, right? Edit: Link formatting
    ale

    ale

    1 year ago
    From the error log it seems that the Docker container is not able to connect to GraphQL…
    deltikron

    deltikron

    1 year ago
    Yes, you were right, I just had a short session with our admin and there was in fact a firewall inbetween the container and graphql, opening that up fixed the issue.
    Thanks for your time!
    In case anyone stumbles across this thread in the future, there is even an open issue for this problem: https://github.com/PrefectHQ/server/issues/25 If I'd found it earlier that would have saved a lot of time 🙈
    r

    Riley Hun

    1 year ago
    Hello @deltikron @ale - I'm encountering the same issue too! I'm not much of a networking guru, but I understand that the fix would be to open up the firewall inbetween the container and graphql. How do I go about doing that?
    deltikron

    deltikron

    1 year ago
    Hey @Riley Hun I'm afraid I can't really help you there, sorry. I'm way out of my depth with the intricacies of network config. What I did was to completely turn off the
    firewalld
    service because I was only working on a local development machine. In a production-type setting this is probably way to dangerous and once we get past the proof-of-concept stage we'll be fine tuning this with the help of our experts.