I am running into an error running a deployment on...
# prefect-docker
n
I am running into an error running a deployment on docker infrastructure. When triggering a deployment to run, it immediately turns up with a submission error (shared in comments). I have checked that docker is running on the machine and works. Not sure where else to look to debug? Anyone here in #C048ZHNT9QS seen this before?
Copy code
Submission failed. Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/local/lib/python3.11/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/local/lib/python3.11/http/client.py", line 975, in send self.connect() File "/usr/local/lib/python3.11/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/local/lib/python3.11/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.11/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/local/lib/python3.11/http/client.py", line 975, in send self.connect() File "/usr/local/lib/python3.11/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 214, in _retrieve_server_version return self.version(api_version=False)["ApiVersion"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/api/daemon.py", line 181, in version return self._result(self._get(url), json=True) ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/utils/decorators.py", line 46, in inner return f(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 237, in _get return self.get(url, **self._set_request_timeout(kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 600, in get return self.request("GET", url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 547, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/prefect/infrastructure/docker.py", line 541, in _get_client docker_client = docker.from_env() ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/client.py", line 96, in from_env return cls( ^^^^ File "/usr/local/lib/python3.11/site-packages/docker/client.py", line 45, in __init__ self.api = APIClient(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 197, in __init__ self._version = self._retrieve_server_version() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 221, in _retrieve_server_version raise DockerException( docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) The above exception was the direct cause of the following exception: RuntimeError: Could not connect to Docker.
It seems to not actually make it to the agent (at least it doesn't generate any logs).
c
That looks like an odd traceback, but are you sure that the docker daemon is running, and in the same namespace (e.g. are you in a conda / poetry environment or something)?
Copy code
self.connect() File "/usr/local/lib/python3.11/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directory
The error is saying it can’t find the unix socket
Usually, it’s either the service isn’t started (systemctl start docker or open docker desktop), or you don’t have permissions with your user to access the socker (check /var/run/docker.sock)
n
I am on a VM running the base python (3.10.6). So no conda or poetry happening. From the exact same place I start the prefect agent:
So it seems like I should have permissions to run docker just fine?
And I just start the prefect agent with
prefect agent start --work-queue dev
c
You can try setting prefect config set PREFECT_LOGGING_LEVEL=DEBUG and re running , but the error looks entirely in the docker traceback stack in trying to connect to the socket
n
No logs are generated at all even with DEBUG on
The error only shows up on the UI
c
I’m curious - do you have python3.11 installed, anywhere? It’s curious you are using
3.10.6
but all the tracebacks are
/usr/local/lib/python3.11
n
Hmmm... I hadn't noticed that. It does not seem that I do.
That is weird. I may blow up my VM and start from scratch to see if I can reproduce the issue.
1
c
Sorry to hear that, but keep us informed - I’ll keep an eye out
👍 1
n
@Christopher Boyd After resetting my vm I ran in to the same error. Interestingly enough, I tried twice. The first time, I had forgotten to configure artifact registry access and I got a permission denied error from python 3.10 which I am running. The second time, I ran into the same error as above from python 3.11 and I am sure that python 3.11 is not on my vm.
c
very odd - I’m checking with the backend team to see if we can determine anything further here, or ask further questions
n
Thank you!
c
artifact registry?
n
Yes
Debian 11 VM. Installed docker through apt.
Any luck finding out where the python3.11 is coming from here?
c
Nothing on the prefect side, it’s not used at all
do those paths actually exist on your system?
also whats $DOCKER_HOST set to
and which version of prefect are you using
n
DOCKER_HOST is not set
Starting v2.7.10 agent
And those paths don't seem to exist... Here is what does exist in `/usr/local/lib`:
Copy code
nelsongriffiths_doubleriver_com@prefect-agent:~$ ls /usr/local/lib
python2.7  python3.9
c
do you have steps I can take to reproduce this
you said you were able to reproduce by deleting your VM and starting over, I’d like to try
n
Sure. Do you want just how I set up the vm?
c
Ideally whatever you did in whole to reach this stage - if you have a flow, how you installed prefect (version), setup the agent, anything you can share steps for minus personal information
n
Okay let me put something together and Ill send it your way later today
1
c
I’m still baffled by the python version - is there a specific image you are trying to use, or a docker image you built / pulled? We aren’t using python 3.11, so the traceback doesn’t seem to be coming from cloud side
so I’m kind of at a loss
n
Same. The docker image is build with
python:3.10.7-slim-buster
Ill put together a Dockerfile that represents what we are doing as well
Just double checked and the Docker Image for sure does not have 3.11 on it
c
Lets try one more thing
if you can grab an strace of running the prefect deployment apply command
strace -o output.txt <whatever command you used to apply the deployment>
and attach
or you can message me with it
n
Does it matter that I create the deployment from a different place than I run the agent from?
c
the deployment creation and agent are irrelevant to each other
you don’t need an agent running to create and apply a deployment
the agent running is to poll the API to receive flow runs
Create deployment -> apply deployment -> create flow run -> agent polls and submits flow_run for execution
n
Okay so I am locally on a macbook. Apparently strace is for Linux only. Any suggestions for alternatives?
c
what version python do you have on your mac
n
I have multiple. The environment making the deployments uses 3.10.5
But I do have 3.11.1 on my macbook
c
are you in any sort of conda / venv environment when you create and apply the deployment
n
Yes I use poetry locally.
c
can you share the deployment object that is being applied
and how you are building it
Copy code
Prefect automatically sets a Docker image matching the Python and Prefect version you're using at deployment time. You can see all available images at Docker Hub.
n
Copy code
"""Deploy our flow."""
from prefect.deployments import Deployment
from prefect.filesystems import GCS
from prefect.orion.schemas.schedules import CronSchedule
from prefect_gcp.cloud_run import CloudRunJob
from prefect.infrastructure.docker import DockerContainer

from data_pipelines.falcon_cap_iq.falcon_cap_iq_flow import falcon_cap_iq_dbt_flow

if __name__ == "__main__":
    docker = DockerContainer.load("data-pipelines-image")
    storage: GCS = GCS.load("prefect-flow-storage")

    deployment = Deployment.build_from_flow(
        name="falcon_cap_iq_dbt_deployment",
        description=(
            "Deployment for triggering DBT for transforming CapIQ data "
            "for use with falcon."
        ),
        version="1",
        tags=["dev", "falcon", "dbt", "snowflake"],
        schedule=CronSchedule(
            cron="0 6 * * 1,2,3,4,5", timezone="America/Denver"
        ),  # Cron schedule to run weekdays at 6:00 AM MST
        flow=falcon_cap_iq_dbt_flow,
        work_queue_name="dev",  
        infrastructure=docker, 
        storage=storage,
        skip_upload=False,
    )
    deployment.apply()
c
You have an API key and API url set on the VM as well to communicate with cloud API as well, and the docker infrastructure block?
n
By that do you mean I logged in via the cli with the correct key?
c
correct
also, I was looking for the object itself taht was being applied, something along the lines of
Copy code
###
### A complete description of a Prefect Deployment for flow 'my-flow'
###
name: my-flow-deployment
description: null
tags:
- test
schedule: null
parameters: {}
infrastructure:
  type: docker-container
  env: {}
  labels: {}
  name: null
  command:
  - python
  - -m
  - prefect.engine
  image: prefecthq/prefect:dev-python3.9
  image_pull_policy: null
  networks: []
  network_mode: null
  auto_remove: false
  volumes: []
  stream_output: true
###
### DO NOT EDIT BELOW THIS LINE
###
flow_name: my-flow
manifest_path: my_flow-manifest.json
storage:
  bucket_path: bucket-full-of-sunshine
  aws_access_key_id: '**********'
  aws_secret_access_key: '**********'
  _is_anonymous: true
  _block_document_name: anonymous-xxxxxxxx-f1ff-4265-b55c-6353a6d65333
  _block_document_id: xxxxxxxx-06c2-4c3c-a505-4a8db0147011
  _block_type_slug: s3
parameter_openapi_schema:
  title: Parameters
  type: object
  properties: {}
  required: null
  definitions: null
n
Where can I get that from if I am not using the cli to build deployments?
c
You should at least be able to pull the infrastructure block from the deployment page in the UI
n
c
I haven’t seen this syntax before, is this accurate?
Copy code
storage: GCS = GCS.load("prefect-flow-storage")
otherwise, I don’t really see anything stand out
n
That is just the prefect-gcp block for cloud storage. And it does work. I have used it successfully with other infrastructures
I am going to start fresh and work through it again. Maybe I did something sill somewhere. Ill let you know what I find
c
if you can doc the steps you take
I can try to reproduce it
👍 1
n
@Christopher Boyd I am pretty baffled by this still. There is something about the way the agent is running that is breaking the docker infrastructure. I can deploy the flow to a local process and it runs. I can also manually load the
DockerContainer
infrastructure and run
_create_and_start_container
and it pulls and runs the image without throwing any errors. I am running all of this in poetry. So it is the exact same environment that I use to run the agent. And it is still throwing 3.11 errors which I do not have on my machine. Is there a better way to step through what the agent is doing and debug it?
c
The agent pulls and deploys infrastructure based on the flow run here: https://github.com/PrefectHQ/prefect/blob/main/src/prefect/agent.py#L411 https://github.com/PrefectHQ/prefect/blob/main/src/prefect/agent.py#L425 You can set prefect config set PREFECT_LOGGING_LEVEL=DEBUG and try to see if we can get any additional details, but I don’t recall if we already did that
n
Oh my. It appears someone else at my company was messing around with dev and started an agent that didn't have access to Docker and left it running somewhere that was picking up my runs. Moving to a new work queue fixed it all. Well thank you for taking the time to help me with a silly error!
🙌 2