Hi all Since I upgrade the GUI to 0.14.19 i can no...
# prefect-server
m
Hi all Since I upgrade the GUI to 0.14.19 i can not use the docker agent anymore. Always get this error, when he starts a flow:
Copy code
May 19 17:30:35 <http://XXX.ch|XXX.ch> run_dev_docker_agent.sh[3306015]: requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8572e5ecd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
(-> full log in the thread.) Before (<=0.14.17) everything was working. On Windows it's working also with .19. But not on: Ubuntu 20.04 Docker version 20.10.3, build 48d30b5 docker-compose version 1.28.2, build unknown Do you get some similar behaviours?
k
Hey @Michael Hadorn, could you verify your docker daemon is working by maybe building/running a simple container?
m
It seems definitive to be a problem of the docker agent (not the gui). Fortunately I could went back to 0.14.17, then it's working again. @Kevin Kho With this it should be clear that the docker daemon is working, right?
k
I see yeah. We’ll look into it
z
Hey @Michael Hadorn -- how are you running your docker agent? Is it in a container within your server setup or just running on a machine or something else?
m
Hi @Zanie The agent is running directly on my ubuntu machine:
prefect agent docker start -l prefect-development --show-flow-logs
@Zanie Hm, could you reproduce it?
@Kevin Kho?
k
Hey sorry for the delay, I'll try to replicate tonight
I could not replicate on Ubuntu 20.04, Python 3.8.10, Prefect 0.14.19 for server and agent. docker-compose version was 1.28.2 as well. Docker version was different. I had 20.10.6. I’ll downgrade this tom and give another stab. What is your Python version?
I used “prefecthq/prefect” for my DockerRun image and it was able to pull it down and run the flow
m
Hi @Kevin Kho I use:
Copy code
prefecthq/prefect:0.14.19-python3.8
So it's Python 3.8.9
👍 1
k
Ok will test again sometime today with the versions
m
thanks a lot!
I mean, I can also test it by myself. Will answer here, if I found the time. ^^
@Kevin Kho Ok, sadly i've got the same error with Docker 20.10.6.
k
Is that the same machine? or you spun up a new one?
m
it's on the same..
yes, I have to test it on a new one.
k
That would be good if you have time
z
I wonder if this is a permissioning issue; what happens if this is run as an admin?
m
the agent? or docker in a way?
z
The agent
m
(with version 0.14.17 it was working)
z
Can you give the output of
prefect diagnostics
?
m
Copy code
{
  "config_overrides": {
    "env": {
      "db": {
        "host": true,
        "password": true,
        "type": true,
        "user": true
      },
      "name": true
    },
    "general": {
      "dependency_valid_window_h": true,
      "email": {
        "email_from": true,
        "email_to": true,
        "enable": true,
        "smtp_port": true,
        "smtp_server": true,
        "smtp_type": true
      },
      "flow_name": true,
      "ignore_last_run_state": true,
      "ignore_tables": true,
      "n_workers": true,
      "skip_already_done_tasks": true
    },
    "object": {
      "fct_labor_result_xlab_": {
        "delta_expr_ge": true
      }
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.4.0-73-generic-x86_64-with-glibc2.10",
    "prefect_backend": "server",
    "prefect_version": "0.14.19",
    "python_version": "3.8.8"
  }
}
z
And
pip show click
m
Copy code
Name: click
Version: 8.0.0
Summary: Composable command line interface toolkit
Home-page: <https://palletsprojects.com/p/click/>
Author: Armin Ronacher
Author-email: <mailto:armin.ronacher@active-4.com|armin.ronacher@active-4.com>
License: BSD-3-Clause
Location: /home/poc/miniconda3/envs/prefect/lib/python3.8/site-packages
Requires:
Required-by: prefect, distributed
z
Can you try downgrading click? There are some compatibility issues with 8.x
pip install click\<8.0
m
ok did it. click==7.1.2 restarted the docker agent. same error
z
Agh 😢
m
maybe i messed up my docker installation. will test it on a new machine and will you update here. thanks for your effort.
z
Okay, thanks Michael
I think I have a couple more debug steps we can take... let me write up my thought
At
src/prefect/agent/docker/agent.py#L416
we set the containers to auto-remove
host_config = {"auto_remove": True}  # type: dict
-- if you set this to
False
instead we should get the flow container to stick around after it fails (you can find the prefect installation location with
pip show prefect
and modify this file)
Once we've got the container around, you should be able to exec into it so we can explore what's going on from the container;
This traceback line
result = client.graphql(query)
shows that the error is happening when the container tries to reach the Server API. I'm presuming the API URL in the container is incorrect or something.
m
Oh nice. Will try it
Hm. If I use it via pip, then this source is not available in conda, right?
sry, found it
Copy code
> docker ps --all
CONTAINER ID   IMAGE                                COMMAND                  CREATED              STATUS                          PORTS                                               NAMES
126f58926104   cdwh/cdwh-flow:prefect-development   "tini -g -- entrypoi…"   About a minute ago   Exited (1) About a minute ago                                                       mini-toucanet
so it's still exited... same error.
Copy code
> docker exec -it 126f58926104 bash
Error response from daemon: Container 126f58926104416a4a253a640df7dab1a0750fef43e7195eeabb091112ab1cdb is not running
z
Yeah... alas you cannot exec an exited container.
Can you
docker logs 669bc43012c3
and see if there's anything good in there?
m
There is the same like i posted in my first post (or what we are looking for)?
Copy code
...
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f53e35ff940>: Failed to establish a new connection: [Errno -2] Name or service not known'))
z
That makes sense
Can also
docker inspect 669bc43012c3
and share the
Env
portion?
What's the URL your Server GraphQL service is located at as well?
m
Copy code
"Env": [
    "PREFECT__LOGGING__LEVEL=INFO",
    "PREFECT__GENERAL__SKIP_ALREADY_DONE_TASKS=false",
    "PREFECT__BACKEND=server",
    "PREFECT__CLOUD__API=<http://host.docker.internal:4200>",
    "PREFECT__CLOUD__AUTH_TOKEN=",
    "PREFECT__CLOUD__AGENT__LABELS=['prefect-development']",
    "PREFECT__CONTEXT__FLOW_RUN_ID=275a48b7-9d79-439f-b0be-3fdac0f617a7",
    "PREFECT__CONTEXT__FLOW_ID=0177a31d-a680-4734-aef4-ce683a97f35a",
    "PREFECT__CONTEXT__IMAGE=cdwh/cdwh-flow:prefect-development",
    "PREFECT__CLOUD__USE_LOCAL_SECRETS=false",
    "PREFECT__LOGGING__LOG_TO_CLOUD=true",
    "PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS=prefect.engine.cloud.CloudFlowRunner",
    "PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS=prefect.engine.cloud.CloudTaskRunner",
    "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "LANG=C.UTF-8",
    "GPG_KEY=E3FF2839C048B25C084DEBE9B26995E310250568",
    "PYTHON_VERSION=3.8.10",
    "PYTHON_PIP_VERSION=21.1.1",
    "PYTHON_GET_PIP_URL=<https://github.com/pypa/get-pip/raw/1954f15b3f102ace496a34a013ea76b061535bd2/public/get-pip.py>",
    "PYTHON_GET_PIP_SHA256=f499d76e0149a673fb8246d88e116db589afbd291739bd84f2cd9a7bca7b6993",
    "LC_ALL=C.UTF-8",
    "PREFECT__USER_CONFIG_PATH=./src/cdwhprefect/config.toml",
    "REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
],
graphql is on the same server on port :4200 (with version 0.14.17 everything works right, so i don't think it's a configuration error - or then i messed up my docker in a way)
@Zanie sorry forget to mention you
z
Is the environment different in the container created in 0.14.17?
Ah I think I see what the issue is here
The patch was for Docker for Linux < v20.10.6
You said you were on
v20.10.6
though so I am surprised it's failing
What happens if you run
docker run -t alpine:latest /bin/sh -c "ping host.docker.internal"
?
(or
docker run --rm alpine nslookup host.docker.internal
)
m
Will test it on Wednesday 😊 Nice days!
Docker it self seems to work, because the only change I did was upgrading the prefect lib inside conda. (And if I go back to the older prefect version, it still works.) If I run the docker-run command, because you asked:
Copy code
> docker run -t alpine:latest /bin/sh -c "ping host.docker.internal"
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
540db60ca938: Pull complete
Digest: sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f
Status: Downloaded newer image for alpine:latest
ping: bad address 'host.docker.internal'

❯ docker run --rm alpine nslookup host.docker.internal
Server:         10.X.XX.X
Address:        10.X.XX.X:XX

** server can't find host.docker.internal: NXDOMAIN

** server can't find host.docker.internal: NXDOMAIN
This is the same on the prd system, where we didn't install the prefect upgrade. Because it's working on 0.14.17. I think we have to dig in the python code. But first i will try it also on a new linux system.
z
This is definitely a regression caused by the PR I linked
Previously, we would infer and provide
host.docker.internal
as a work around for a bug in docker where it would not be consistently available
m
Hm, that means? 😅
z
So in the
moby
link I provided there's an issue where people complain that
host.docker.internal
is not consistently resolvable in their containers; this was something we patched by forcing it to be resolvable then removed because it appeared that the fix had been released in Docker
(and it was causing errors for some users)
m
Ok. Then I'm confused, why it's not working on our Ubuntu... Seems like it will not work without this fix. So it's maybe the same like before and we need this workaround from https://github.com/PrefectHQ/prefect/issues/2324
z
Yeah we may need to add something back in although the old code was breaking things for users as well so I'm hesitant to do so
You can use the workaround in that issue to add the host manually
Is it feasible to upgrade your Docker version as well? Very confused that the fix they released isn't working for you
m
ftr: Means this: https://github.com/PrefectHQ/prefect/issues/2324#issuecomment-613351789 will test it. 🙂 Thanks for your support!