https://prefect.io logo
j

Javier Domingo Cansino

03/23/2021, 5:25 PM
o/ does anyone know about any document that contains information on the overall implementation? I'm trying to understand how to deploy prefect in my k8s cluster, and from what I understand, dask is a requirement to run kubernetes, but I don't understand what's the relationship there. Does a prefect agent create a dask cluster, consisting only of itself and then run tasks on it? Or am I supposed to create a dask cluster, with all dependencies installed, and then connect the agent to it?
m

Mariia Kerimova

03/23/2021, 6:49 PM
Hello Javier! It's not required to create dask cluster. If you want to use Prefect Cloud and run flows in your kubernetes cluster I would recommend to checkout KubernetesRun, Kubernetes Agent and different flavors of Storage.
Note that you can run you agent externally of the cluster. So you can register you flow and start kubernetes agent with command
prefect agent kubernetes start --name AgentName --token <runner token> --label <label-matching-flow-label>
. Here is a simple flow you can use to test your setup:
Copy code
from prefect import task, Flow
from prefect.storage import Docker
from prefect.run_configs import KubernetesRun
import random
from time import sleep

@task
def inc(x):
    sleep(random.random() / 10)
    return x + 1

@task
def dec(x):
    sleep(random.random() / 10)
    return x - 1

@task
def add(x, y):
    sleep(random.random() / 10)
    return x + y

@task(name="sum")
def list_sum(arr):
    return sum(arr)


with Flow("example-flow-kube") as flow:
    incs = inc.map(x=range(4))
    decs = dec.map(x=range(4))
    adds = add.map(x=incs, y=decs)
    total = list_sum(adds)

if __name__ == "__main__":
    flow.storage = Docker(registry_url="<>", image_name="<>")
    flow.run_config = KubernetesRun(labels=["<>"])
    flow.register(project_name="<>")
j

Javier Domingo Cansino

03/23/2021, 9:03 PM
Sure, but I'm trying to understand how the Agent is run in k8s
so far, from what I understand, if comparing to airflow, we would have the Agent be the director part of the worker and the dask cluster the executor part?
I'm just trying to understand the architecture as a whole
thanks for replying btw
is there any helm chart to deploy the agent in k8s?
I found this issue but it didn't address helm chart part https://github.com/PrefectHQ/prefect/issues/1692
m

Mariia Kerimova

03/24/2021, 12:05 PM
Yes, there is a helm chart in Prefect Server repo. You can see the agent deployment template here.
j

Javier Domingo Cansino

03/24/2021, 12:05 PM
ah it's in the server repo, ok, thanks!
👍 1
just in case I'm using prefect cloud, so didn't think about checking the server part 😉
j

Jermaine Carn

03/24/2021, 4:43 PM
This video from Prefect’s YouTube page may help understand the architecture as well:

https://youtu.be/50S4RqeEVVo

. Kubernetes agents and dask agents are discussed
🙌 1
j

Javier Domingo Cansino

03/25/2021, 11:27 AM
So from what I understand, it's a good practice to couple the work that needs to be done with prefect itself?
it seems like the official recommendation is to implement things in python directly and run them with prefect with no separation?
m

Mariia Kerimova

03/25/2021, 12:57 PM
I hope I understood your question correctly. From Python perspective, yes, you need add task decorators to your functions and organize tasks info the flow. From infrastructure perspective, you need to run an agent, which polls for flows you want to run and deploys them. I hope it answers the question, but let me know if something is not clear
j

Javier Domingo Cansino

03/28/2021, 1:25 PM
@Mariia Kerimova it's not exactly what I asked, but that does put some light on the matter. I have started to write https://github.com/txomon/txomon.github.io/blob/master/content/post/2021/03/25/prefect-overview.md trying to explain what I have understood on how the different entities interact with each other. Would you mind having a look and telling me stuff that is wrong in there?
m

Mariia Kerimova

03/29/2021, 1:36 PM
Sure! It's looks like a good summary! I would rephrase agent definition (getting tasks flow runs) and Fargate is deprecated in favor of ECS.
j

Javier Domingo Cansino

03/30/2021, 1:16 PM
@Mariia Kerimova awesome, thanks! I was just worried that I didn't get the abstractions in the right place
regarding that, given that tasks using DaskExecutor may use a different image, does it mean that if I were to code my ETLs in pure python, I would only be bound to Dask as a dependency and not prefect? I'm trying to make sure that I don't fall into the same problem as with airflow, where coding a PythonOperator will couple your code to ALL the python dependencies from airflow itself, making it horrible to code with
j

Jeremiah

03/30/2021, 1:52 PM
Hi Javier - Prefect needs to be installed to define a
task
, so the simplest implementations will package Prefect + execution code together. However, your task can operate however you prefer - for example, inside your Prefect-defined task you could submit work to a remote Dask cluster and wait for its result. Note that the Prefect
DaskExecutor
ships the entire Prefect task to the Dask cluster, and therefore does require Prefect to be available; in my example, you would be creating your own Dask client inside the task and using it manually.
We’ll have more enhancements later this year to assist with running tasks in diverse environments that may not have Prefect installed; until then, users typically use a pattern like this to submit the work themselves.
j

Javier Domingo Cansino

03/30/2021, 2:30 PM
Thank you for your answer @Jeremiah, does this mean that all prefect dependencies will be shipped need to be available in Dask too?
I'm trying to understand if there is a way to have my ETL code written in python with no dependency collision to the scheduler code
the pattern I'm currently using with airflow is to have all the tools from the ETL packaged in Docker and use kubernetes pod operator to run it
this is far from optimal because rolling changes on the tools and the ETLs require synchronization, and the separation doesn't specifically benefit us
j

Jim Crist-Harif

03/30/2021, 4:45 PM
Hi Javier, you might find this doc (https://docs.prefect.io/orchestration/flow_config/docker.html#dependency-requirements) useful. If all your tasks are defined in the same file as the flow definition, no extra dependencies are needed in the images running your dask cluster (as these task definitions will be serialized via cloudpickle to all dask worker nodes). Any functionality you define in a separate file from the original flow definition will need to be available on the dask worker nodes though.
3 Views