r

    Riley Hun

    1 year ago
    How do we add add 
    host.docker.internal
     to 
    /etc/hosts
     via 
    --add-host
    ? Is this something we add to the running agent or the config.toml?
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Sorry, why are you trying to do this? What needs to access
    host.docker.internal
    ?
    r

    Riley Hun

    1 year ago
    @Jim Crist-Harif I am getting this error referenced in this thread [1]. I have been trying all day to resolve it but couldn't find a resolution. [1] https://github.com/PrefectHQ/prefect/issues/2324
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    The architecture of prefect server has changed significantly since then, I suspect you're running into a different (but similar looking) issue. Can you post the tracebacks you're seeing and things you've tried?
    r

    Riley Hun

    1 year ago
    Here are my diagnostics:
    {
      "config_overrides": {
        "server": {
          "ui": {
            "apollo_url": true
          }
        }
      },
      "env_vars": [],
      "system_information": {
        "platform": "Linux-5.4.0-1029-gcp-x86_64-with-glibc2.29",
        "prefect_backend": "server",
        "prefect_version": "0.13.16",
        "python_version": "3.8.5"
      }
    }
    Full Error Log:
    [2020-11-18 22:28:12+0000] ERROR - prefect.CloudTaskRunner | Failed to set task state with error: ConnectionError(MaxRetryError("HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4a34b2eb50>: Failed to establish a new connection: [Errno -2] Name or service not known'))"))
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
        (self._dns_host, self.port), self.timeout, **extra_kw
      File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 61, in create_connection
        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
      File "/opt/conda/lib/python3.7/socket.py", line 752, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -2] Name or service not known
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
        chunked=chunked,
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/opt/conda/lib/python3.7/http/client.py", line 1252, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/opt/conda/lib/python3.7/http/client.py", line 1298, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/opt/conda/lib/python3.7/http/client.py", line 1247, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/opt/conda/lib/python3.7/http/client.py", line 1026, in _send_output
        self.send(msg)
      File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
        self.connect()
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 187, in connect
        conn = self._new_conn()
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 172, in _new_conn
        self, "Failed to establish a new connection: %s" % e
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f4a34b2eb50>: Failed to establish a new connection: [Errno -2] Name or service not known
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
        timeout=timeout
      File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
        method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
      File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 439, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4a34b2eb50>: Failed to establish a new connection: [Errno -2] Name or service not known'))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/prefect/engine/cloud/task_runner.py", line 128, in call_runner_target_handlers
        cache_for=self.task.cache_for,
      File "/opt/conda/lib/python3.7/site-packages/prefect/client/client.py", line 1461, in set_task_run_state
        version=version,
      File "/opt/conda/lib/python3.7/site-packages/prefect/client/client.py", line 302, in graphql
        retry_on_api_error=retry_on_api_error,
      File "/opt/conda/lib/python3.7/site-packages/prefect/client/client.py", line 218, in post
        retry_on_api_error=retry_on_api_error,
      File "/opt/conda/lib/python3.7/site-packages/prefect/client/client.py", line 434, in _request
        session=session, method=method, url=url, params=params, headers=headers
      File "/opt/conda/lib/python3.7/site-packages/prefect/client/client.py", line 340, in _send_request
        response = <http://session.post|session.post>(url, headers=headers, json=params, timeout=30)
      File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
        return self.request('POST', url, data=data, json=json, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
        resp = self.send(prep, **send_kwargs)
      File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
        r = adapter.send(request, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
        raise ConnectionError(e, request=request)
    I've tried what @Laura Lorenz suggested in this thread [1]
    # new 20.04.1 LTS
    sudo apt update
    sudo apt install python3-pip
    pip3 install prefect
    sudo apt install docker docker-compose
    sudo systemctl start  docker
    sudo usermod -aG docker $USER
    # logged out to refresh my user groups
    docker container run hello-world
    # add firewall rule in GCP to allow ingress on port 8080
    # changed config.toml to reference apollo url
    prefect backend server
    prefect server start
    [1] https://github.com/PrefectHQ/server/issues/25
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    How did you start your agent? It's not clear to me where the flow is getting that address from, but it has to be getting it from somewhere.
    r

    Riley Hun

    1 year ago
    prefect agent start docker --show-flow-logs
    As a side note I tried this on prefect core server on my local machine as well, and still ran into same issue.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Hmmm, ok. I'm taking off for today, but will flag this thread for whoever is on support tomorrow to pick up.
    r

    Riley Hun

    1 year ago
    Thanks! Appreciate it!
    Is there anyone available today to take a look at this issue? Would be very much appreciative!
    Dylan

    Dylan

    1 year ago
    Hey @Riley Hun! Just to start with the basics, your docker agent is successfully connecting to your Prefect Server instance, right?
    In that it polls for work and starts containers successfully?
    r

    Riley Hun

    1 year ago
    Hi @Dylan, Yup sure does. It is able to retrieve the runs successfully and even successfully pulls in the docker image from GCS.
    Dylan

    Dylan

    1 year ago
    Can you try explicitly providing a URI to the agent start command:
    prefect agent docker start --api <http://localhost:4200>
    r

    Riley Hun

    1 year ago
    Is this the uri of the prefect server? Mine is on port 8080.
    Dylan

    Dylan

    1 year ago
    That’s the one
    I think that setting may tell the agent how to configure the api path in the flow
    But if that doesn’t work, please let me know
    r

    Riley Hun

    1 year ago
    Okay got it, thanks. I'm just generating the docker image and pushing to GCR right now - might take a bit...
    Dylan

    Dylan

    1 year ago
    👍 no rush on my end 😛
    Btw the default port is actually
    localhost:4200
    r

    Riley Hun

    1 year ago
    Oh and that's the graphql server?
    Dylan

    Dylan

    1 year ago
    Technically it’s apollo which then fetches the schema from graphql (these are named services in prefect server)
    It’s the externally-facing API which is a graphql endpoint
    r

    Riley Hun

    1 year ago
    Hmm... no I'm afraid that didn't do the trick. I'm getting the same error.
    Dylan

    Dylan

    1 year ago
    hmmmm
    r

    Riley Hun

    1 year ago
    prefect agent docker start --api <http://localhost:4200>
    
    [2020-11-19 22:40:55,824] INFO - agent | Starting DockerAgent with labels []
    [2020-11-19 22:40:55,824] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
    [2020-11-19 22:40:55,824] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>
    [2020-11-19 22:40:55,852] INFO - agent | Waiting for flow runs...
    [2020-11-19 22:43:13,018] INFO - agent | Found 1 flow run(s) to submit for execution.
    [2020-11-19 22:43:13,119] INFO - agent | Deploying flow run ce49e7a2-e8f6-4103-88c4-cfe371225d03
    [2020-11-19 22:43:13,120] INFO - agent | Pulling image <http://gcr.io/aa-mlops-dev-inm5/prefect-etl-storage:0.1.0|gcr.io/aa-mlops-dev-inm5/prefect-etl-storage:0.1.0>...
    [2020-11-19 22:43:17,278] INFO - agent | Successfully pulled image <http://gcr.io/aa-mlops-dev-inm5/prefect-etl-storage:0.1.0|gcr.io/aa-mlops-dev-inm5/prefect-etl-storage:0.1.0>...
    Kubernetes Pods Logs:
    [2020-11-19 22:43:21+0000] ERROR - prefect.CloudTaskRunner | Failed to set task state with error: ConnectionError(MaxRetryError("HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f223bdd7b70>: Failed to establish a new connection: [Errno -2] Name or service not known'))"))
    Dylan

    Dylan

    1 year ago
    looking at the traceback is seems like something on Linux is preventing it from communicating back to host.docker.internal
    We’ll pick the thread back up on this issue https://github.com/PrefectHQ/server/issues/25
    r

    Riley Hun

    1 year ago
    Could this be the issue? "get_docker_ip()" isn't return anything?
    from prefect.utilities.docker_util import get_docker_ip
    print(get_docker_ip())
    Dylan

    Dylan

    1 year ago
    🧐
    r

    Riley Hun

    1 year ago
    Ohhhh. Okay I'm on my Mac on my local machine. When I use my remote machine using Ubuntu, then it returns something.
    Dylan

    Dylan

    1 year ago
    right
    we’re going to pick up the thread with you on this github issue
    Once we have some bandwidth to spin up an instance and get into the weeds
    Keep an eye on this issue and let us know if you make any progress
    r

    Riley Hun

    1 year ago
    Okay sounds good, thanks. I guess development will be stalled a bit. It was working fine in August though. Then I returned to this project in November and tried to run using the same build script and it failed. Not sure if that's an important detail, but thought I'd point it out.
    Dylan

    Dylan

    1 year ago
    Interesting. Would you happen to know your version(s) in August? Add any and all details you can think of to the issue, please 😄 Any information helps!
    r

    Riley Hun

    1 year ago
    Let me check my github repo history.
    Before, I was using 0.12.6. Also should note that before, I was using the same docker image for the dask workers and flow storage, which inherited from prefecthq/prefect:0.12.6-python3.7. Now, I'm using the daskdev/dask docker image for the dask workers and prefecthq/prefect:0.13.15-python3.7 for flow storage.
    @Dylan - Just thought I would let you know that I switched to using the newly released helm chart to deploy Prefect Server on Kubernetes and then used a Kubernetes Agent and now my flow is working just fine. Still not able to get it working using a Compute Engine instance w/ Docker Agent though, but perfectly ok with Prefect Server on GKE instead.
    Dylan

    Dylan

    1 year ago
    Glad you were able to figure something out!