Hey all! Our team is going to try and setup prefec...
# ask-community
w
Hey all! Our team is going to try and setup prefect to be more "proper". We are going to use
DockerStorage
, if we can, so that registering flows brings along dependencies. We also want to use
KubernetesRun
so that prefect can schedule jobs on k8s. For the latter, however, is
KubernetesRun
still the right thing to do? Someone outside this community mentioned to us about dask for scheduling instead of the "kubernetes scheduler" (his words; I'm still new to this and figuring it out). Basically, which way is the "right way"? Adding dask isn't something we're opposed to doing, but we have a lot of stuff going on and are aiming for the lowest effort path, haha. I promise we're not lazy, just overworked.
a
You can certainly use Docker storage to package dependencies and simultaneously use
KubernetesRun
- here is an example. And you can use any executor with that as well as long as it's defined in your flow
👀 1
w
Thanks @Anna Geller; this looks great. I'll go step-by-step and start with
DockerStorage
and eventually move over to
KubernetesRun
then. This is very helpful.
👍 1
k
Kubernetes doesnt have a scheduler. It’s a container orchestration thing. So Prefect has a scheduler and Dask has a scheduler and the Prefect scheduler is responsible for traversing the DAG and then submitting it to Dask. Think of Prefect scheduler as macro level and Dask as micro level.
w
@Kevin Kho thanks, that's what we thought too. It was someone kind of disconnected from using prefect, and we were questioning that knowledge.
@Anna Geller I finally got to where I'm working on using
DockerStorage
(still using
LocalRun
for the time being), and I have a
Dockerfile
that's currently pretty empty, just to get one in place. It looks exactly like this:
Copy code
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
Then I hacked together a
DockerStorage
instantiation exactly like this (with the account id removed):
Copy code
return storage.Docker(
            registry_url="<http://accountid.dkr.ecr.us-west-1.amazonaws.com|accountid.dkr.ecr.us-west-1.amazonaws.com>",
            image_name=flow.name,
            image_tag="latest",
            dockerfile="/workflows/Dockerfile",  # this file exists in our docker container
        )
But when I run this command from within the docker container where we have always registered flows, I see output like this:
Copy code
Tenant id: 604a7da3-df0c-4639-aef8-58c228f30829
API cloud hook id: 01df50e6-02e3-41c9-9c6b-6e71eb1226d0
Connector project id: c684f02b-88d1-4cf5-af38-c563fcab9bb2
Collecting flows...
Processing 'flows/example_flow.py':
  Building `Docker` storage...
    Error building storage:
      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
        File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
        File "/usr/local/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
        File "/usr/local/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
        File "/usr/local/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
        File "/usr/local/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
        File "/usr/local/lib/python3.8/http/client.py", line 951, in send
    self.connect()
        File "/usr/local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
      FileNotFoundError: [Errno 2] No such file or directory
And then several more lines, each with basically this exception:
Copy code
During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/site-packages/prefect/cli/build_register.py", line 463, in build_and_register
    storage.build()
        File "/usr/local/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
    self._build_image(push=push)
        File "/usr/local/lib/python3.8/site-packages/prefect/storage/docker.py", line 340, in _build_image
    client = self._get_client()
        File "/usr/local/lib/python3.8/site-packages/prefect/storage/docker.py", line 554, in _get_client
    return docker.APIClient(
        File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
        File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 221, in _retrieve_server_version
    raise DockerException(
      docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file
or directory'))
Do y'all have any clues what I'm doing wrong? Please don't tell me we have to execute this "on the host".
k
I think it just can’t find Docker. What happens when you do
docker run hello-world
on the CLI?
That’s basically your benchmark if the Prefect stuff that uses the Docker API will work
w
Well we register our flows from within the docker container. That's what we always did, because we deploy, and then when the system is coming up, it registers all the flows against prefect. Prefect then serializes the code and informs the agent where it is (e.g. s3, local, etc)
k
Then you need to mount the Docker sock to the container so you have access to Docker from within the container (Docker in Docker). So you do something like
-v /var/run/docker.sock:/var/run/docker.sock
attached to your Docker run command
w
oh hmm... i'm gonna check that out and give it a shot. thanks for the tip @Kevin Kho
@Kevin Kho so that was major help. docker is now trying to build docker in docker. but...
so we have the same
DockerStorage
above, except now we set
dockerfile=/register/Dockerfile
, and that directory just has these in it:
Dockerfile, app/, workflows/
. the
Dockerfile
just has a few lines:
Copy code
FROM prefecthq/prefect

ENV PIP_NO_CACHE_DIR=1

RUN apt-get update
RUN apt-get upgrade -y

COPY requirements.txt ./
RUN pip install -r requirements.txt

COPY app /app
RUN pip install -e /app

COPY workflows /workflows
RUN pip install -e /workflows

ENV PYTHONPATH=/
but when i'm running our
prefect register
command to register a flow, it starts up the docker storage build and then errors out with:
Copy code
Processing '/workflows/flows/dbt_flow.py':
  Building `Docker` storage...
2022-02-24 01:09:09+0000 - INFO - prefect.Docker - PID=178[MainProcess] - docker._build_image:358 - flow=[id=None; run_id=None; name=None
] - Building the flow's Docker storage...
    Error building storage:
      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/site-packages/docker/utils/build.py", line 97, in create_archive
    t.addfile(i, f)
        File "/usr/local/lib/python3.8/tarfile.py", line 1999, in addfile
    copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
        File "/usr/local/lib/python3.8/tarfile.py", line 255, in copyfileobj
    raise exception("unexpected end of data")
      OSError: unexpected end of data

During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/site-packages/prefect/cli/build_register.py", line 463, in build_and_register
    storage.build()
        File "/usr/local/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
    self._build_image(push=push)
        File "/usr/local/lib/python3.8/site-packages/prefect/storage/docker.py", line 364, in _build_image
    output = client.build(
        File "/usr/local/lib/python3.8/site-packages/docker/api/build.py", line 159, in build
    context = utils.tar(
        File "/usr/local/lib/python3.8/site-packages/docker/utils/build.py", line 29, in tar
    return create_archive(
        File "/usr/local/lib/python3.8/site-packages/docker/utils/build.py", line 99, in create_archive
    raise IOError(
      OSError: Can not read file in context: /proc/bus/pci/30ee:00/00.0

  Registering 'dbt'... Error
================== 0 registered, 1 errored ==================
any clues on that? i'm not sure how /proc/bus/... is in the context
you know what? nevermind. i'm really dumb. i think i changed the working directory to
/
and caused this
k
I just got back to my desk. You good now?
w
yeah. i'm still working out the docker build, but i think i'm getting there. thanks 🙂
👍 1
@Kevin Kho one quick question if you know though. if a build fails and i want to clean it up, do you know how to do that? docker cmds don't actually exist inside my container, and i don't know where it was keeping the intermediate build, haha
k
I actually don’t know where that is stored off the top of my head
w
np; i'll do some googling later 🙂 for now i'll nuke the whole thing and start over
appreciate it!
@Kevin Kho I'm back at this again. Is there a way to inform
DockerStorage
which working directory it should use? I created a directory in
/registrar
that contains
Dockerfile, workflows/, app/
, and then I told
DockerStorage
to use
dockerfile=/registrar/Dockerfile
, but I don't think it knows to use that as the build context, because as soon as it reaches
COPY workflows /workflows
, it comes out that it can't find that file in the build context.
Wait, I might have found a sort of solution.
k
Use WORKDIR in the image?
w
Oh, no, it wasn't that. It's something else. I will try to explain if I finally reach a working solution.