Hello I m not sure if it s the best channel to ask this ques Prefect Community #ask-community

Hello! I'm not sure if it's the best channel to as...

Adrien Besnard

02/08/2024, 4:41 PM

Hello! I'm not sure if it's the best channel to ask this question, but: do you have any insight / best practices on how to split deployments & flows? ⬇️

Adrien Besnard

02/08/2024, 4:44 PM

Like I kind of like having a

load_source(name: str)

flow with multiple deployments (like

load-source/crm

load-source/erp

, which leverage on the

name

parameter, etc.) but I have the feeling that it's not very clean: • Either the

load_source

function become bloated because it should handle all the source `name`s • Either the

load_source

function delegate everything to my internal Python library that does the heavywork and it becomes an empty shell

Adrien Besnard

02/08/2024, 4:45 PM

I don't know if you have any recommendation on patterns to follow 🙂

Robin Niel

02/08/2024, 4:47 PM

Hello ! I don’t know if that’s best practice or not but I’ve designed a small config library to load .toml configs to allow easily to have multiple deployments for using the same flows. For example here what the config of one deployment looks like :

Copy code

[[flows]]
name="facebook-comments-last-3-days-daily"
cron="30 4 * * *"
parameters={ "time_interval"="last_3_days"}
module="flows.facebook.fb_comments"
flow_function="facebook_get_comments_flow"
tags=["facebook", "comments"]
description="Get comments from Facebook active publications (less than 3 days old) every day"

Robin Niel

02/08/2024, 4:48 PM

This has been working fine for us so far 🙂

Adrien Besnard

02/08/2024, 4:49 PM

Interesting, thanks! The library loads the TOML file and create the deployments using Prefect's Python API, right?

Robin Niel

02/08/2024, 4:49 PM

Exactly, we run a python script in our ci (or locally in our docker compose stack) that parse configs and create deployments

Adrien Besnard

02/08/2024, 4:50 PM

I'm actually doing the same kind of things but using the

prefect.yaml

directly and

prefect --no-prompt deploy --all

Adrien Besnard

02/08/2024, 4:51 PM

The

prefect.yaml

looks like this:

Copy code

name: metabase-reports
prefect-version: 2.13.4

pull:
- prefect.deployments.steps.set_working_directory:
    directory: "/usr/lib/data-platform/prefect/"

deployments:
- name: integrate-crm
  version: "{{ $VERSION }}"
  description: |
    Integrate CRM to the Staging layer of the Lakehouse
  flow_name: null
  entrypoint: ./flows/integrate_crm.py:integrate_crm
  parameters: {}
  work_pool: 
    name: default
    job_variables:
      image: "{{ $DOCKER_REGISTRY }}/{{ $DOCKER_REPOSITORY }}:{{ $DOCKER_TAG }}"

Robin Niel

02/08/2024, 4:56 PM

I see ! That’s pretty great I think I haven’t read enough about prefect.yaml. We went the custom way because we wanted to have a structure with separate files like : configs | facebook comments.toml publication_metrics.toml | twitter tweet_metrics.toml ... Do you think the yaml can be splitted into this kind of hierarchy ?

Adrien Besnard

02/08/2024, 4:58 PM

I don't think the YAML could be split, but you can have multiple

prefect.yaml

. But you'll loose the possibility to run a single command to deploy everything... So I don't think we can do what you want with YAML files only 😓

✅ 1

Adrien Besnard

02/08/2024, 5:24 PM

@Robin Niel: I saw in your TOML file that you provide a Python module. I didn't know it was possible (I thought you have to specify a file path). How are you using it within your internal tool?

Robin Niel

02/08/2024, 5:25 PM

I am using importlib :

Copy code

def get_deployment(self, work_pool_name, is_schedule_active):
        source_module = importlib.import_module(self.__module)
        flow_function = getattr(source_module, self.__flow_function)
        return flow_function.to_deployment(
            name=self.__name,
            tags=self.__tags,
            parameters=self.__parameters,
            cron=self.__cron,
            description=self.__description,
            work_pool_name=work_pool_name,
            is_schedule_active=is_schedule_active,
        )

Adrien Besnard

02/08/2024, 5:30 PM

And how do you package your module, then? Is everything embedded into a Docker image, for instance?

Robin Niel

02/08/2024, 5:31 PM

Yes we run everything in containers, the image contains the flows and the script to deploy

Adrien Besnard

02/08/2024, 5:33 PM

Thanks! I'm doing exactly the same thing, that's why I was curious 🙂 And one last question (which is broader, actually): how do you manage tags? I mean how do you decide which tag to use on a particular deployment?

Adrien Besnard

02/08/2024, 5:33 PM

I put the name of the "module" (like source, export, etc.) I'm using as a tag, but... It's actually pretty useless 😅

Robin Niel

02/08/2024, 5:36 PM

Maybe it’s easier for us because we get data mainly from social networks API so we have tags for facebook, twitter etc.. and then we have “data” type like comments, publication, metrics .. It makes it pretty simple to know how does comment collect for all social networks went for the day or all collect for a specific network

3 Views

Open in Slack

Previous Next