j

    Javier Domingo Cansino

    1 year ago
    o/ does anyone know about any document that contains information on the overall implementation? I'm trying to understand how to deploy prefect in my k8s cluster, and from what I understand, dask is a requirement to run kubernetes, but I don't understand what's the relationship there. Does a prefect agent create a dask cluster, consisting only of itself and then run tasks on it? Or am I supposed to create a dask cluster, with all dependencies installed, and then connect the agent to it?
    m

    Mariia Kerimova

    1 year ago
    Hello Javier! It's not required to create dask cluster. If you want to use Prefect Cloud and run flows in your kubernetes cluster I would recommend to checkout KubernetesRun, Kubernetes Agent and different flavors of Storage.
    Note that you can run you agent externally of the cluster. So you can register you flow and start kubernetes agent with command
    prefect agent kubernetes start --name AgentName --token <runner token> --label <label-matching-flow-label>
    . Here is a simple flow you can use to test your setup:
    from prefect import task, Flow
    from prefect.storage import Docker
    from prefect.run_configs import KubernetesRun
    import random
    from time import sleep
    
    @task
    def inc(x):
        sleep(random.random() / 10)
        return x + 1
    
    @task
    def dec(x):
        sleep(random.random() / 10)
        return x - 1
    
    @task
    def add(x, y):
        sleep(random.random() / 10)
        return x + y
    
    @task(name="sum")
    def list_sum(arr):
        return sum(arr)
    
    
    with Flow("example-flow-kube") as flow:
        incs = inc.map(x=range(4))
        decs = dec.map(x=range(4))
        adds = add.map(x=incs, y=decs)
        total = list_sum(adds)
    
    if __name__ == "__main__":
        flow.storage = Docker(registry_url="<>", image_name="<>")
        flow.run_config = KubernetesRun(labels=["<>"])
        flow.register(project_name="<>")
    j

    Javier Domingo Cansino

    1 year ago
    Sure, but I'm trying to understand how the Agent is run in k8s
    so far, from what I understand, if comparing to airflow, we would have the Agent be the director part of the worker and the dask cluster the executor part?
    I'm just trying to understand the architecture as a whole
    thanks for replying btw
    is there any helm chart to deploy the agent in k8s?
    I found this issue but it didn't address helm chart part https://github.com/PrefectHQ/prefect/issues/1692
    m

    Mariia Kerimova

    1 year ago
    Yes, there is a helm chart in Prefect Server repo. You can see the agent deployment template here.
    j

    Javier Domingo Cansino

    1 year ago
    ah it's in the server repo, ok, thanks!
    just in case I'm using prefect cloud, so didn't think about checking the server part 😉
    j

    Jermaine Carn

    1 year ago
    This video from Prefect’s YouTube page may help understand the architecture as well:

    https://youtu.be/50S4RqeEVVo

    . Kubernetes agents and dask agents are discussed
    j

    Javier Domingo Cansino

    1 year ago
    So from what I understand, it's a good practice to couple the work that needs to be done with prefect itself?
    it seems like the official recommendation is to implement things in python directly and run them with prefect with no separation?
    m

    Mariia Kerimova

    1 year ago
    I hope I understood your question correctly. From Python perspective, yes, you need add task decorators to your functions and organize tasks info the flow. From infrastructure perspective, you need to run an agent, which polls for flows you want to run and deploys them. I hope it answers the question, but let me know if something is not clear
    j

    Javier Domingo Cansino

    1 year ago
    @Mariia Kerimova it's not exactly what I asked, but that does put some light on the matter. I have started to write https://github.com/txomon/txomon.github.io/blob/master/content/post/2021/03/25/prefect-overview.md trying to explain what I have understood on how the different entities interact with each other. Would you mind having a look and telling me stuff that is wrong in there?
    m

    Mariia Kerimova

    1 year ago
    Sure! It's looks like a good summary! I would rephrase agent definition (getting tasks flow runs) and Fargate is deprecated in favor of ECS.
    j

    Javier Domingo Cansino

    1 year ago
    @Mariia Kerimova awesome, thanks! I was just worried that I didn't get the abstractions in the right place
    regarding that, given that tasks using DaskExecutor may use a different image, does it mean that if I were to code my ETLs in pure python, I would only be bound to Dask as a dependency and not prefect? I'm trying to make sure that I don't fall into the same problem as with airflow, where coding a PythonOperator will couple your code to ALL the python dependencies from airflow itself, making it horrible to code with
    Jeremiah

    Jeremiah

    1 year ago
    Hi Javier - Prefect needs to be installed to define a
    task
    , so the simplest implementations will package Prefect + execution code together. However, your task can operate however you prefer - for example, inside your Prefect-defined task you could submit work to a remote Dask cluster and wait for its result. Note that the Prefect
    DaskExecutor
    ships the entire Prefect task to the Dask cluster, and therefore does require Prefect to be available; in my example, you would be creating your own Dask client inside the task and using it manually.
    We’ll have more enhancements later this year to assist with running tasks in diverse environments that may not have Prefect installed; until then, users typically use a pattern like this to submit the work themselves.
    j

    Javier Domingo Cansino

    1 year ago
    Thank you for your answer @Jeremiah, does this mean that all prefect dependencies will be shipped need to be available in Dask too?
    I'm trying to understand if there is a way to have my ETL code written in python with no dependency collision to the scheduler code
    the pattern I'm currently using with airflow is to have all the tools from the ETL packaged in Docker and use kubernetes pod operator to run it
    this is far from optimal because rolling changes on the tools and the ETLs require synchronization, and the separation doesn't specifically benefit us
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Hi Javier, you might find this doc (https://docs.prefect.io/orchestration/flow_config/docker.html#dependency-requirements) useful. If all your tasks are defined in the same file as the flow definition, no extra dependencies are needed in the images running your dask cluster (as these task definitions will be serialized via cloudpickle to all dask worker nodes). Any functionality you define in a separate file from the original flow definition will need to be available on the dask worker nodes though.