I m using a dask executer and kubernetes run to run a flow o Prefect Community #prefect-server

I'm using a dask executer and kubernetes run, to ...

Daniel Davee

06/03/2021, 9:39 PM

I'm using a dask executer and kubernetes run, to run a flow on a kubenetes cluster. But it doesn’t seem to want to import my sub modules when it runs on the cluster. Do the I need to include them on the worker or is there a way they can be uploaded with flow?

Kevin Kho

06/03/2021, 9:40 PM

Hey @Daniel Davee! By submodules you mean other code related to your flow that you import right? The best thing here is to probably package them into a Docker container to be loaded altogether.

Daniel Davee

06/03/2021, 9:41 PM

Yes I am.

Daniel Davee

06/03/2021, 9:42 PM

Do you mean I need to create a new image for each ETL I have?

nicholas

06/03/2021, 9:45 PM

@Daniel Davee - it could be a little bit of that but perhaps using a base image that has your shared submodules installed that you either use directly for your flows or extend for those that require anything extra

nicholas

06/03/2021, 9:46 PM

But regardless, the submodules will need to be installed on whatever image you use so that they're accessible at runtime

Daniel Davee

06/03/2021, 9:53 PM

So what I'm trying to do is build the DAG's dynamically at run time. It seems like I would need to keep all ETL code in the same environment.

nicholas

06/03/2021, 9:54 PM

That's correct - any code that could be included in the dynamic DAG would need to be available to the process though, right?

Daniel Davee

06/03/2021, 9:56 PM

Yeah, I guess I hoped to be able to keep the code off of the image. Would a side car help? Sorry I literally just learned Docker and Kubenetes trying to get this to work.

Daniel Davee

06/03/2021, 9:57 PM

I'm not totally sure what side car is or does, but would I keep all the code on one node, and move to the environment?

nicholas

06/03/2021, 10:02 PM

No worries! This is really up to preference but the most straightforward implementation would be to import the modules you need at container build time; this has the added benefit of containing your code at build time instead of doing something like pulling from source, which makes it difficult to understand if/when bugs are introduced or code changes. Since your flow will be deployed on a pod with the storage you built at registration time, you end up with a really good encapsulation for your flow

Kevin Kho

06/03/2021, 11:00 PM

You can install the ETL code as a Python module on that base image for your flows.

upvote 1

Daniel Davee

06/03/2021, 11:10 PM

How does that work?

Kevin Kho

06/03/2021, 11:14 PM

I have this repo as an example. There is a

Dockerfile

there. The last line of the Docker image is

RUN pip install -e .

This makes the module available in the image. Note you need a

setup.py

for this to work. This guide will have more info

Daniel Davee

06/04/2021, 12:10 AM

Could I have a private pip on a container in the pod, that can be installed from dynamically?

Kevin Kho

06/04/2021, 12:29 AM

Like you have an internal copy of PyPI for security purposes and you need to pip from there?

Daniel Davee

06/04/2021, 12:33 AM

Yeah, basically I'm trying to glue different functions together on a DAG dynamically. But really I need a place to store the code in some directory like structure so when I send the job to dask, it's able to import the code. Honestly if there is a way to set it up that this directory on another container is some how in the dask scheduler path, that would probably work best. But like I said I barely learned Docker the other day, so I don't even know if that's possible.

Daniel Davee

06/04/2021, 12:34 AM

I suppose if I had a giant package that I pip installed from private pip directory, that could work too.

Kevin Kho

06/04/2021, 12:49 AM

The Docker container would serve the same purpose where you install all Python packages there like pandas, numpy, scipy, dask, etc. (the default prefect image contains dask and dask contains a lot of those so it might just be adding your module on top)

Kevin Kho

06/04/2021, 12:49 AM

How do you spin up a Dask cluster?

Kevin Kho

06/04/2021, 1:15 AM

About the private pip, you shouldn't need it in this scenario if Docker has everything but it's normally something your security team sets up for you and they give you a certificate, and then you save that somewhere and pip will use that for the private PyPI

Daniel Davee

06/04/2021, 4:18 PM

Sorry for late reply, So I have setup a dask executer on GCP which I connect to from the prefect server to run the code. It seems like the private pip isn't really what I am looking for. Is there a way to add to the dask executer a python path that would be housed on a different container. I could have all the code on the dask executer , but that seems like that would cause scaling issues. Am I correct?

Kevin Kho

06/04/2021, 4:22 PM

By DaskExectuor on GCP, do you mean GKE? I think it might be easier if you point to the address of the Dask cluster. You can do

DaskExecutor(address)

and point it. Prefect will then connect and execute the code there assuming it can make the connection.

Kevin Kho

06/04/2021, 4:23 PM

We have a partner that specializes in infrastructure that I can introduce you to btw. A quick call with them might clear up the path you want to take for your deployment.

Daniel Davee

06/04/2021, 4:24 PM

Yes, sorry to many GXX stuff. This is what I already do. The problem is the matching environments and code access

Daniel Davee

06/04/2021, 4:25 PM

That would be so great, this the my weakest part but I think I almost got it but I can't really be sure

Kevin Kho

06/04/2021, 4:32 PM

@George Coyne

Kevin Kho

06/04/2021, 4:33 PM

No it’s fine, I probably can’t help you as good as our partner can

2 Views

Open in Slack

Previous Next