Hi all, I am really struggling getting this to wor...
# ask-community
a
Hi all, I am really struggling getting this to work. I am trying to use
Docker Storage
with a
ECSRun Config
. What I am looking to make happen is to have the github repo cloned to the docker container so that my flow has access to various files (jupyter notebooks primarily). I have been trying solutions for a number of days and I am currently stuck on
Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or direct
This would indicate that my prefect image does not have access to the docker deamon. But I can’t figure out what I am doing wrong. I have
prefect backend cloud
set. And below are my files.
Dockerfile
Copy code
FROM prefecthq/prefect
FROM <http://docker.pkg.github.com/<company>/<repo>/data-image:latest|docker.pkg.github.com/<company>/<repo>/data-image:latest>

# Prefect Config
ARG GH_BRANCH=master
ARG GH_TOKEN
ARG PREFECT_DEPS="jupyter,aws"
ARG PREFECT_HOME=/usr/local/
ARG PREFECT_VERSION=0.14.10

ENV GH_TOKEN=$GH_TOKEN

# Copy in required files
COPY requirements.txt ./

# Install Python Requirements
RUN pip install -U pip
RUN pip install prefect[${PREFECT_DEPS}]==${PREFECT_VERSION}

# Install VIM and Bash completion
RUN apt-get update
RUN apt-get install -y vim
RUN apt-get install -y bash-completion

# Cloning the master branch
WORKDIR ${PREFECT_HOME}
RUN git clone --branch ${GH_BRANCH} https://${GH_TOKEN}@github.com/<company>/<repo>

# Renaming the directory for convenience 
RUN cp -r "${PREFECT_HOME}data-team-pipeline" "${PREFECT_HOME}pipeline"

WORKDIR "${PREFECT_HOME}pipeline"
Docker Compose I’m using this to run and test the flows locally
Copy code
version: "3.3"
services:
prefect:
    image: prefect-image:latest
    restart: always
    command: bash -c "prefect auth login -t <AUTH_TOKEN> && /bin/bash"
    working_dir: /usr/local/pipeline/flows/
    environment:
      PREFECT__CONTEXT__SECRETS__GH_TOKEN: ${GH_TOKEN}
      PREFECT__CONTEXT__SECRETS__GH_USER: ${GH_USER} 
      WORKENV: dev
    volumes:
      - type: bind
        source: .
        target: /usr/local/pipeline
      
      - type: bind
        source: ${HOME}/.aws
        target: /root/.aws

      - type: bind
        source: ${HOME}/.prefect
        target: /root/.prefect
Flow
Copy code
import sys
import os
sys.path.insert(0, os.path.abspath('..'))

import prefect
from prefect import task, Flow
from prefect.tasks.jupyter.jupyter import ExecuteNotebook

from flows.prefect_utils import (
  RUN_CONFIG,
  STORAGE
)

@task
def query_snowflake():
  logger = prefect.context.get("logger")
  <http://logger.info|logger.info>("Running Notebook 01_Raw_Data_Snowflake_Query")
  ExecuteNotebook(path='/usr/local/pipeline/flows/01_Raw_Data_Snowflake_Query.ipynb')

with Flow("attribution") as flow:
  query_snowflake()

flow.storage=STORAGE
flow.run_config=RUN_CONFIG
flow.register(project_name="tutorial")
Prefect Utils File
Copy code
import os
import logging
import sys
from typing import Tuple
from prefect.run_configs import ECSRun
from prefect.storage import S3
from prefect.storage import GitHub
from prefect.storage import Docker

from prefect.client import Secret
from prefect.schedules import CronSchedule

logging.basicConfig(level=<http://logging.INFO|logging.INFO>, stream=sys.stdout)
logger = logging.getLogger(__name__)

work_env = os.getenv("WORKENV")
GH_TOKEN = Secret("GH_TOKEN").get()

PREFECT_ENV_VARS = {
  "GH_TOKEN": GH_TOKEN
}

DOCKER_REGISTRY = "<http://docker.pkg.github.com/<company>/<repo>/|docker.pkg.github.com/<company>/<repo>/>"
PREFECT_DATA_IMAGE = "<http://docker.pkg.github.com/<company>/<repo>/prefect-image:1.0.0|docker.pkg.github.com/<company>/<repo>/prefect-image:1.0.0>"

# if work_env == 'dev':
TASK_ARN = <ECS_TASK_ARN
RUN_CONFIG = ECSRun(labels=['s3-flow-storage'],
                      task_role_arn=TASK_ARN,
                      image='prefecthq/prefect:latest',
                      memory=512,
                      cpu=256
                    )
STORAGE = Docker(
    registry_url=DOCKER_REGISTRY,
    base_image=PREFECT_DATA_IMAGE,
    env_vars=PREFECT_ENV_VARS
)
I can get other flows working that don’t involve
Docker Storage
But that doesn’t help me if I need to reference these other files.
c
This almost always is caused by a poorly configured Docker daemon, independently of prefect - I recommend trying to build that Docker file using the Docker cli directly and debug from there
a
I am building it from the CLI with
docker build -f Dockerfile --no-cache --build-arg GH_TOKEN=${GH_TOKEN} -t ${IMAGE} .
The Docker compose is just running it.
@Chris White I went and ran the below as well and am receiving the same error
Copy code
docker run \                                                                                          
--workdir /usr/local/pipeline/flows/ \
-e PREFECT__CONTEXT__SECRETS__GH_TOKEN=$GH_TOKEN \
-e PREFECT__CONTEXT__SECRETS__GH_USER=$GU_USER \
-e WORKENV=dev \
--mount type=bind,source=$(pwd),target=/usr/local/pipeline \
--mount type=bind,source=${HOME}/.prefect,target=/root/.prefect -it prefect-image
I also tried it with the official prefect image
prefecthq/prefect
and received the same error
do I need to expose a certain port?
It seems like it the issue has something with not being able to find the local docker agent. But it only happens when I use the Docker Storage. Does
Docker Storage
not work when using the prefect docker container to run flows
c
Oh you’re running this from within a Docker container? Yea that’s the issue - you can’t run Docker within docker, or at least Docker recommends against it
a
to run my fl,ows
to run my flows
I didnt see anything in the docs that said you cant use the
Docker Storage
when running it this way but am i right in my understanding that that’s the issue?
c
Yea, this is unrelated to prefect - you can’t access a Docker daemon from within a Docker image; those prefect images are intended to be used as base image for your flows
a
what do you mean by that? Maybe I am not understanidng how this workflow works
my thought process was that I build a docker image using Prefect as the base and then include an additional image I have already created
then run the flows from there
would i want to be running something like
S3 Storage
in dev and then
Docker Storage
in prod? or is what I’m trying to do not possible
I’m trying to do this because we have an internal image
data-image
which comes prepackaged with all of the packages we use. And I’d like to use that in dev>CI>prod
c
Yea, your end goal is valid — you can use Docker storage built on top of your
data-image
and with Prefect included; the thing you’re getting tripped up on though is that registering a flow with Docker storage requires building an image that your flow is placed into, and you can’t build a docker image from within a docker container. This means you’ll need to call
flow.register
from a non-docker process. We are starting to recommend that folks use other storage types (e.g. S3) along with a fixed image that you build independently to avoid these sorts of complications — check out this newly published doc that covers some of these patterns: https://docs.prefect.io/orchestration/flow_config/docker.html
a
ok yeah, i think what I’ll end up doing is setting up a conda environment that the team opens into that has prefect and the like installed so that they can test the flows. The idea was to keep eveyrthing standardized. But let me ask, in the docs it says that you recommend relying on a different storage mechanism. Is there anyway to achieve what I am trying to do other than Docker storage? ie. We have a Github repo with all of our code and a team Docker image with all teh dependencies built in. Then, within that same github repo are files we want to run (ie. jupyter notebooks)
c
Yea you can still build Docker images that your flows run in that contain all the relevant files and dependencies, and store your flow separate from that
a
Is that the Docker Runner then?
instead of ECS?
c
All agents other than the local agent support both Docker storage and running flows within Docker images, so you aren’t really constrained there either way
a
so lets say i have ECS up and running ads the agent. How do I run the flow in the image?
c
Sorry just seeing this - you can provide an
image="address-to-registry-and-image"
to your Run Config (all types except Local and Universal should accept this kwarg)
a
intersting. and that will know to pull that image down and run the flows inside it?
c
yup yup
a
one last question.. this has been so helpful. does it build the image on every run
c
Glad I could help - specifying an
image
on your run config means you’re referencing an already built image so the step of building the image happens externally to prefect