I'm trying to test `DaskExecutor` , would `DockerR...
# prefect-community
m
I'm trying to test
DaskExecutor
, would
DockerRun
be a good method to see if the dockerfile works or not?
a
Actually, the easiest way to test DaskExecutor would be to run the workflow locally without Docker first. But it depends on what do you try to do. We've recently updated this topic that shows how you can use LocalDaskExecutor and how/when to move to DaskExecutor - perhaps is this what you are looking for? https://discourse.prefect.io/t/what-is-the-difference-between-a-daskexecutor-and-a-localdaskexecutor/374
But if you ask how you can pass a Docker image to your Dask cluster class, you could provide that through a cluster_kwargs like so:
Copy code
flow.executor = DaskExecutor(
# FargateCluster is just one example
cluster_class="dask_cloudprovider.aws.FargateCluster",
    cluster_kwargs={"n_workers": 4, "image": "your-prefect-image"},
)
m
@Anna Geller I had the flow working on the LocalDaskExecutor, but I'm trying to make sure the Docker image I have works. It's kind of cumbersome to test it directly on a Fargate cluster, so that's why I'm trying to test it locally first.
a
gotcha, have you tried coiled? they make it so much easier to test various cluster configuration and build the proper containerized environment for Dask
@Max Lei alternatively, if you have a local Kubernetes cluster on your machine, you could test it that way perhaps?
Copy code
with Flow(
    "Dask Kubernetes Flow",
    storage=storage,
    executor=DaskExecutor(
        cluster_class=lambda: KubeCluster(make_pod_spec(image=prefect.context.image)),
        adapt_kwargs={"minimum": 2, "maximum": 3},
    ),
    run_config=KubernetesRun(),
) as flow:
this blog post gives more info https://medium.com/slateco-blog/prefect-x-kubernetes-x-ephemeral-dask-power-without-responsibility-6e10b4f2fe40
k
I think testing locally is still not enough to guarantee it works on Fargate because of IAM roles and image compatibility and stuff like that
m
@Anna Geller Unfortunately I have no idea how to use kubernetes. Would
DockerRun
be the closest option?
upvote 1
@Kevin Kho I have a fargate cluster with the correctly configured roles, tested it with docker images deploys using the local CLI etc as close as what prefect would do.
k
I would say DockerRun is the closest option yep unless Anna says otherwise
a
what do you mean by Fargate instance? Fargate is serverless so there are no instances 🙃 or do you mean you have some ECS Fargate service running 24/7 for testing?
I just can't grasp what would you test with just DockerRun because it wouldn't run on a cluster but just in a container right? but it of course depends on what you try to do
maybe also just building your image, then doing:
Copy code
docker run --rm -it image_name
and then within the container run some code to test if everything has been properly installed and configured?
k
I think Max is thinking that if the Flow runs with DockerRun successfully, he knows all the dependencies are there and then if he uses the Docker image for the Dask cluster, it should work as well. I think it makes sense
👍 1
a
my poor misuse of the matrix 😅 https://imgflip.com/i/6721ma
😅 1
m
@Anna Geller Yeah I mean an ECS Fargate service. I'm pretty sure the service works, or at least I know how to fix it if the issue are permission/images.
I'm looking at your repo: https://github.com/anna-geller/packaging-prefect-flows/blob/master/Dockerfile#L3, I placed my flows in /opt/prefect/flows, but I'm getting
Failed to load and execute Flow's environment: ModuleNotFoundError("No module named '/home/ubuntu/'")
. Is this a docker container issue or something else?
a
I believe it's a user issue within your container. Can you share your Dockerfile?
k
I think this is a Storage issue where Local storage was used?
m
Copy code
FROM prefecthq/prefect:0.11.4-python3.7

COPY ./dist/<PACKAGE_NAME_REMOVED>-0.0.0.tar.gz .
COPY requirements.txt .
COPY gh ./.config/

RUN apt update
RUN apt install -y curl
RUN curl -fsSL <https://cli.github.com/packages/githubcli-archive-keyring.gpg> | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg && \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] <https://cli.github.com/packages> stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null
RUN apt update
RUN apt install -y build-essential gcc gh
RUN pip install setuptools==50.0.0
RUN pip install pip==20.2
RUN pip install pystan==2.17.1.0
RUN pip install fbprophet==0.7.1
RUN pip install -r requirements.txt
RUN pip install <PACKAGE_NAME_REMOVED>-0.0.0.tar.gz

WORKDIR /opt/prefect
COPY . /opt/prefect/flows/
I did not setup a storage, I'm assuming it's the default local storage?
Sounds like you need an S3 storage for this and also
DaskExecutor
?
a
yup Kevin, that could be when using Docker storage. @Max Lei would you consider upgrading to a more recent version than 0.11.4? a loooot has changed since then. We have 1.0.0 now 😎
k
You need any remote storage that the agent can pull Github/S3/etc. It is just looking for the path locally
upvote 1
a
thx for sharing the Dockerfile, can you also share your Flow definition, especially the storage and run config? I could try to reproduce or investigate
m
That part I can't share, but I only have
DockerRun(image="<IMAGE_NAME_REMOVED>:latest")
, nothing special used in the flows.
k
Yeah you’ll probably just need to add any remote Storage and make sure your agent is authenticated to pull from it
fbprophet
is so hard to deploy from experience. Like a pip install isn’t enough cuz it has some compiling after
m
@Kevin Kho I'm sort of confused why the agent would need a remote storage though? Isn't everything required for the agent local?
Yeah it is, if
fbprophet
is in requirements it would break, so you need to install it before your requirements.
I've heard GreyKite provides similar functionality and graphing but without the install headache.
k
When the agent polls Prefect for flows to run, it will find a Flow and then load in the configuration from the database (Storage, RunConfig). It then looks for the storage, fetches the flow definition from there, and runs it on top of the RunConfig. It looks like your registration machine is saving the Flow locally (default Local storage) and then the agent is on a different machine that looks for that file path but it doesn’t exist. So you need storage to be something like S3/Github where the agent can find it and pull it down and then execute it. Does that make sense? Greykite…my current impression is it’s just a toy for now?
a
Prefect deploys a Docker container for your flow run and then within that Docker container it pulls your flow from Storage. That's why you need remote storage, especially later when you use Fargate
k
Or yes what Anna said use Docker Storage to find the Flow file inside the container that you already have
👍 1
a
and if you really want to use local storage with Docker, you can try this https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows/local_script_docker_run_local_image.py
m
Ah ok, I see, I'll try setting up a storage and retry.
👍 1
@Kevin Kho Greykite is actually not that bad, performs pretty similar to fbprophet when you need rough forecasts.
👍 1
m
Also, @Max Lei, looking at you dockerfile I noticed there are a lot of RUN cmd’s in there. Note that every cmd creates a layer in your image and that more layers means a larger sized image. You can slim down by combining related cmd’s into one layer…
upvote 3