https://prefect.io logo
Title
m

Michael Hadorn

05/19/2021, 3:35 PM
Hi all Since I upgrade the GUI to 0.14.19 i can not use the docker agent anymore. Always get this error, when he starts a flow:
May 19 17:30:35 <http://XXX.ch|XXX.ch> run_dev_docker_agent.sh[3306015]: requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8572e5ecd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
(-> full log in the thread.) Before (<=0.14.17) everything was working. On Windows it's working also with .19. But not on: Ubuntu 20.04 Docker version 20.10.3, build 48d30b5 docker-compose version 1.28.2, build unknown Do you get some similar behaviours?
k

Kevin Kho

05/19/2021, 3:59 PM
Hey @Michael Hadorn, could you verify your docker daemon is working by maybe building/running a simple container?
m

Michael Hadorn

05/19/2021, 4:01 PM
It seems definitive to be a problem of the docker agent (not the gui). Fortunately I could went back to 0.14.17, then it's working again. @Kevin Kho With this it should be clear that the docker daemon is working, right?
k

Kevin Kho

05/19/2021, 4:01 PM
I see yeah. We’ll look into it
z

Zanie

05/19/2021, 4:25 PM
Hey @Michael Hadorn -- how are you running your docker agent? Is it in a container within your server setup or just running on a machine or something else?
m

Michael Hadorn

05/19/2021, 4:27 PM
Hi @Zanie The agent is running directly on my ubuntu machine:
prefect agent docker start -l prefect-development --show-flow-logs
@Zanie Hm, could you reproduce it?
@Kevin Kho?
k

Kevin Kho

05/20/2021, 11:06 PM
Hey sorry for the delay, I'll try to replicate tonight
I could not replicate on Ubuntu 20.04, Python 3.8.10, Prefect 0.14.19 for server and agent. docker-compose version was 1.28.2 as well. Docker version was different. I had 20.10.6. I’ll downgrade this tom and give another stab. What is your Python version?
I used “prefecthq/prefect” for my DockerRun image and it was able to pull it down and run the flow
m

Michael Hadorn

05/21/2021, 10:32 AM
Hi @Kevin Kho I use:
prefecthq/prefect:0.14.19-python3.8
So it's Python 3.8.9
👍 1
k

Kevin Kho

05/21/2021, 12:20 PM
Ok will test again sometime today with the versions
m

Michael Hadorn

05/21/2021, 12:32 PM
thanks a lot!
I mean, I can also test it by myself. Will answer here, if I found the time. ^^
@Kevin Kho Ok, sadly i've got the same error with Docker 20.10.6.
k

Kevin Kho

05/21/2021, 2:25 PM
Is that the same machine? or you spun up a new one?
m

Michael Hadorn

05/21/2021, 2:25 PM
it's on the same..
yes, I have to test it on a new one.
k

Kevin Kho

05/21/2021, 2:25 PM
That would be good if you have time
z

Zanie

05/21/2021, 2:26 PM
I wonder if this is a permissioning issue; what happens if this is run as an admin?
m

Michael Hadorn

05/21/2021, 2:26 PM
the agent? or docker in a way?
z

Zanie

05/21/2021, 2:26 PM
The agent
m

Michael Hadorn

05/21/2021, 2:26 PM
(with version 0.14.17 it was working)
z

Zanie

05/21/2021, 2:27 PM
Can you give the output of
prefect diagnostics
?
m

Michael Hadorn

05/21/2021, 2:28 PM
{
  "config_overrides": {
    "env": {
      "db": {
        "host": true,
        "password": true,
        "type": true,
        "user": true
      },
      "name": true
    },
    "general": {
      "dependency_valid_window_h": true,
      "email": {
        "email_from": true,
        "email_to": true,
        "enable": true,
        "smtp_port": true,
        "smtp_server": true,
        "smtp_type": true
      },
      "flow_name": true,
      "ignore_last_run_state": true,
      "ignore_tables": true,
      "n_workers": true,
      "skip_already_done_tasks": true
    },
    "object": {
      "fct_labor_result_xlab_": {
        "delta_expr_ge": true
      }
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.4.0-73-generic-x86_64-with-glibc2.10",
    "prefect_backend": "server",
    "prefect_version": "0.14.19",
    "python_version": "3.8.8"
  }
}
z

Zanie

05/21/2021, 2:30 PM
And
pip show click
m

Michael Hadorn

05/21/2021, 2:30 PM
Name: click
Version: 8.0.0
Summary: Composable command line interface toolkit
Home-page: <https://palletsprojects.com/p/click/>
Author: Armin Ronacher
Author-email: <mailto:armin.ronacher@active-4.com|armin.ronacher@active-4.com>
License: BSD-3-Clause
Location: /home/poc/miniconda3/envs/prefect/lib/python3.8/site-packages
Requires:
Required-by: prefect, distributed
z

Zanie

05/21/2021, 2:31 PM
Can you try downgrading click? There are some compatibility issues with 8.x
pip install click\<8.0
m

Michael Hadorn

05/21/2021, 2:34 PM
ok did it. click==7.1.2 restarted the docker agent. same error
z

Zanie

05/21/2021, 2:34 PM
Agh 😢
m

Michael Hadorn

05/21/2021, 2:35 PM
maybe i messed up my docker installation. will test it on a new machine and will you update here. thanks for your effort.
z

Zanie

05/21/2021, 2:35 PM
Okay, thanks Michael
I think I have a couple more debug steps we can take... let me write up my thought
At
src/prefect/agent/docker/agent.py#L416
we set the containers to auto-remove
host_config = {"auto_remove": True}  # type: dict
-- if you set this to
False
instead we should get the flow container to stick around after it fails (you can find the prefect installation location with
pip show prefect
and modify this file)
Once we've got the container around, you should be able to exec into it so we can explore what's going on from the container;
This traceback line
result = client.graphql(query)
shows that the error is happening when the container tries to reach the Server API. I'm presuming the API URL in the container is incorrect or something.
m

Michael Hadorn

05/21/2021, 2:58 PM
Oh nice. Will try it
Hm. If I use it via pip, then this source is not available in conda, right?
sry, found it
> docker ps --all
CONTAINER ID   IMAGE                                COMMAND                  CREATED              STATUS                          PORTS                                               NAMES
126f58926104   cdwh/cdwh-flow:prefect-development   "tini -g -- entrypoi…"   About a minute ago   Exited (1) About a minute ago                                                       mini-toucanet
so it's still exited... same error.
> docker exec -it 126f58926104 bash
Error response from daemon: Container 126f58926104416a4a253a640df7dab1a0750fef43e7195eeabb091112ab1cdb is not running
z

Zanie

05/21/2021, 3:12 PM
Yeah... alas you cannot exec an exited container.
Can you
docker logs 669bc43012c3
and see if there's anything good in there?
m

Michael Hadorn

05/21/2021, 3:14 PM
There is the same like i posted in my first post (or what we are looking for)?
...
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f53e35ff940>: Failed to establish a new connection: [Errno -2] Name or service not known'))
z

Zanie

05/21/2021, 3:14 PM
That makes sense
Can also
docker inspect 669bc43012c3
and share the
Env
portion?
What's the URL your Server GraphQL service is located at as well?
m

Michael Hadorn

05/21/2021, 3:23 PM
"Env": [
    "PREFECT__LOGGING__LEVEL=INFO",
    "PREFECT__GENERAL__SKIP_ALREADY_DONE_TASKS=false",
    "PREFECT__BACKEND=server",
    "PREFECT__CLOUD__API=<http://host.docker.internal:4200>",
    "PREFECT__CLOUD__AUTH_TOKEN=",
    "PREFECT__CLOUD__AGENT__LABELS=['prefect-development']",
    "PREFECT__CONTEXT__FLOW_RUN_ID=275a48b7-9d79-439f-b0be-3fdac0f617a7",
    "PREFECT__CONTEXT__FLOW_ID=0177a31d-a680-4734-aef4-ce683a97f35a",
    "PREFECT__CONTEXT__IMAGE=cdwh/cdwh-flow:prefect-development",
    "PREFECT__CLOUD__USE_LOCAL_SECRETS=false",
    "PREFECT__LOGGING__LOG_TO_CLOUD=true",
    "PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS=prefect.engine.cloud.CloudFlowRunner",
    "PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS=prefect.engine.cloud.CloudTaskRunner",
    "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "LANG=C.UTF-8",
    "GPG_KEY=E3FF2839C048B25C084DEBE9B26995E310250568",
    "PYTHON_VERSION=3.8.10",
    "PYTHON_PIP_VERSION=21.1.1",
    "PYTHON_GET_PIP_URL=<https://github.com/pypa/get-pip/raw/1954f15b3f102ace496a34a013ea76b061535bd2/public/get-pip.py>",
    "PYTHON_GET_PIP_SHA256=f499d76e0149a673fb8246d88e116db589afbd291739bd84f2cd9a7bca7b6993",
    "LC_ALL=C.UTF-8",
    "PREFECT__USER_CONFIG_PATH=./src/cdwhprefect/config.toml",
    "REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
],
graphql is on the same server on port :4200 (with version 0.14.17 everything works right, so i don't think it's a configuration error - or then i messed up my docker in a way)
@Zanie sorry forget to mention you
z

Zanie

05/21/2021, 4:50 PM
Is the environment different in the container created in 0.14.17?
Ah I think I see what the issue is here
The patch was for Docker for Linux < v20.10.6
You said you were on
v20.10.6
though so I am surprised it's failing
What happens if you run
docker run -t alpine:latest /bin/sh -c "ping host.docker.internal"
?
(or
docker run --rm alpine nslookup host.docker.internal
)
m

Michael Hadorn

05/21/2021, 5:28 PM
Will test it on Wednesday 😊 Nice days!
Docker it self seems to work, because the only change I did was upgrading the prefect lib inside conda. (And if I go back to the older prefect version, it still works.) If I run the docker-run command, because you asked:
> docker run -t alpine:latest /bin/sh -c "ping host.docker.internal"
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
540db60ca938: Pull complete
Digest: sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f
Status: Downloaded newer image for alpine:latest
ping: bad address 'host.docker.internal'

❯ docker run --rm alpine nslookup host.docker.internal
Server:         10.X.XX.X
Address:        10.X.XX.X:XX

** server can't find host.docker.internal: NXDOMAIN

** server can't find host.docker.internal: NXDOMAIN
This is the same on the prd system, where we didn't install the prefect upgrade. Because it's working on 0.14.17. I think we have to dig in the python code. But first i will try it also on a new linux system.
z

Zanie

05/26/2021, 2:42 PM
This is definitely a regression caused by the PR I linked
Previously, we would infer and provide
host.docker.internal
as a work around for a bug in docker where it would not be consistently available
m

Michael Hadorn

05/26/2021, 3:05 PM
Hm, that means? 😅
z

Zanie

05/26/2021, 3:15 PM
So in the
moby
link I provided there's an issue where people complain that
host.docker.internal
is not consistently resolvable in their containers; this was something we patched by forcing it to be resolvable then removed because it appeared that the fix had been released in Docker
(and it was causing errors for some users)
m

Michael Hadorn

05/26/2021, 3:22 PM
Ok. Then I'm confused, why it's not working on our Ubuntu... Seems like it will not work without this fix. So it's maybe the same like before and we need this workaround from https://github.com/PrefectHQ/prefect/issues/2324
z

Zanie

05/26/2021, 3:29 PM
Yeah we may need to add something back in although the old code was breaking things for users as well so I'm hesitant to do so
You can use the workaround in that issue to add the host manually
Is it feasible to upgrade your Docker version as well? Very confused that the fix they released isn't working for you
m

Michael Hadorn

05/26/2021, 3:31 PM
ftr: Means this: https://github.com/PrefectHQ/prefect/issues/2324#issuecomment-613351789 will test it. 🙂 Thanks for your support!