https://prefect.io logo
Title
d

Daniel Davee

05/20/2021, 4:08 PM
I'm being told by my Kubernetes engineer that he can not pull
image: prefecthq/prefect:latest
on to GKE, can prefect be ran on GKE
k

Kevin Kho

05/20/2021, 4:11 PM
Hi @Daniel Davee! Is this through helm? Or is this like during a Flow?
d

Daniel Davee

05/20/2021, 4:13 PM
We used helm to install dask, but the prefect maifest doesn't seem to work GKE, but what we need is have Prefect installed on the Dask Workers
k

Kevin Kho

05/20/2021, 4:14 PM
You can use the Prefect image with helm and it contains Dask, but let me ask the team if there’s anything else we can add
d

Daniel Davee

05/20/2021, 4:15 PM
Ok but when we pull the Prefect image on to GKE the pods don't show up.
t

Tyler Wanner

05/20/2021, 4:22 PM
Hi Daniel, there's nothing about GKE in particular that would prevent pulling the prefect docker image. Is there an error? Is there perhaps any kind of AdmissionController configured in your cluster?
We run Prefect on GKE smoothly, as do many others
d

Daniel Davee

05/20/2021, 4:52 PM
Yeah that is what our other engineer said, we are working on it now. I'm not a Kubernetes expert, so I was just trying to see if there was something I missed.
t

Tyler Wanner

05/20/2021, 4:58 PM
np if you need anything kubernetes let me know, happy to help there
d

Daniel Davee

05/20/2021, 4:59 PM
Thank you so much
🚀 1
@Tyler Wanner this is the error we are getting Traceback (most recent call last):  File "/usr/local/bin/dask-scheduler", line 5, in <module>   from distributed.cli.dask_scheduler import go  File "/usr/local/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 119, in <module>   @click.version_option()  File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 247, in decorator   _param_memo(f, OptionClass(param_decls, **option_attrs))  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2467, in init   super().__init__(param_decls, type=type, multiple=multiple, **attrs)  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2108, in init   ) from None ValueError: 'default' must be a list when 'multiple' is true.
When try to run the static dask cluster the image is prefectHq:latest
This is the error on the worker node raceback (most recent call last):  File "/usr/local/bin/dask-worker", line 8, in <module>   sys.exit(go())  File "/usr/local/lib/python3.7/site-packages/distributed/cli/dask_worker.py", line 461, in go   check_python_3()  File "/usr/local/lib/python3.7/site-packages/distributed/cli/utils.py", line 32, in check_python_3   _unicodefun._verify_python3_env() AttributeError: module 'click._unicodefun' has no attribute '_verify_python3_env'
t

Tyler Wanner

05/20/2021, 5:15 PM
hmm seems like the dask CLI command is erroring out. I'm a bit unfamiliar with that piece of execution so I'll need some help there
k

Kevin Kho

05/20/2021, 5:16 PM
I see. That error is related to prefect 0.14.18 and 0.14.19 not having a pinned version of click. looks like they need 0.14.17. Can you try
prefecthq/prefect:0.14.17
. This will be updated in the next version
🙌 1
d

Daniel Davee

05/20/2021, 5:25 PM
It seems to be working on microk8s, thank you. Ill let you know when get up on GKE
🚀 1
So I have dask working, but when I try use pandas it doesn't load the module
how do I load the environment on to dask
k

Kevin Kho

05/20/2021, 5:58 PM
I think you want to use something like this after helm: https://docs.dask.org/en/latest/setup/kubernetes-helm.html#configure-environment . This will get those libraries you need installed on the workers.
Look for
EXTRA_PIP_PACKAGES
d

Daniel Davee

05/20/2021, 5:59 PM
Ok but does that mean every external package used in my ETL has to be loaded on the worker?
k

Kevin Kho

05/20/2021, 6:00 PM
Yes that is a general Dask requirement, and you can even run into issues if the package versions don’t watch between client and workers. This kills the workers.
Because the tasks are shipped to the workers and executed there so they need those libaries
d

Daniel Davee

05/20/2021, 6:01 PM
Is there a way to spin up the environment dynamically ?
Like a docker sidecar?
k

Kevin Kho

05/20/2021, 6:03 PM
You can use the
KubernetesRun
configuration attached to your flow and pass an image this way. Docs .
Or you can use an executor dynamically like
executor=DaskExecutor(
    cluster_class="dask_kubernetes.KubeCluster",
    cluster_kwargs={
        "pod_template": make_pod_spec(
            image="prefecthq/prefect:latest",
            labels={
                "flow": flow_name
            },
            memory_limit=None,
            memory_request=None,
        )
    },
    adapt_kwargs={"maximum": 10}
)
d

Daniel Davee

05/20/2021, 6:20 PM
flow.executor = DaskExecutor( cluster_class="dask_cloudprovider.aws.FargateCluster", cluster_kwargs={"n_workers": 4, "image": "my-prefect-image"}, )
the image is a docker image?
k

Kevin Kho

05/20/2021, 6:23 PM
Yes that this. This is for temporarily spun up Dask clusters, which the flow will then execute with
d

Daniel Davee

05/20/2021, 6:25 PM
Well what my use case is, I want to be able to define ETL DAGs dynamically and be able to run them on Prefect. So to do this, they would need to provide the code and also a docker image?
k

Kevin Kho

05/20/2021, 6:26 PM
So the code written as a Flow, and then this executor is attached to the Flow (with the image). When the Flow is loaded for running, the executor information is also loaded.
d

Daniel Davee

05/20/2021, 6:28 PM
Well the code is written and the dag structure is stored in a graphdb, and at runtime the flow is generated. At this point I would attach runtime environment, which I assume is the docker image?
k

Kevin Kho

05/20/2021, 6:33 PM
Yes to written and stored, but not a graphdb. It is stored where you specify and that would be something like Docker, S3, Github. The runtime environment is created during the “build” time as opposed to “run” time. You write it along with your Flow. Yes this is the docker image info also gets attached to you flow. And then all this information gets loaded during runtime. The agent will find where the Flow is stored and what Executor/RunConfig you had, and then run it.
d

Daniel Davee

05/20/2021, 6:36 PM
I'm building an interface on top of prefect. I'm storing the DAG information in a graphdb to generate later. But I use the DAG for a lot of things.
k

Kevin Kho

05/20/2021, 6:40 PM
Oh I see what you mean. Only if you’re willing to share, what is your use case? Like data lineage?
But just to be clear, the graphdb storage is separate from the Storage used for execution right?