Hello! I'm not sure if it's the best channel to as...
# ask-community
a
Hello! I'm not sure if it's the best channel to ask this question, but: do you have any insight / best practices on how to split deployments & flows? ⬇️
Like I kind of like having a
load_source(name: str)
flow with multiple deployments (like
load-source/crm
,
load-source/erp
, which leverage on the
name
parameter, etc.) but I have the feeling that it's not very clean: • Either the
load_source
function become bloated because it should handle all the source `name`s • Either the
load_source
function delegate everything to my internal Python library that does the heavywork and it becomes an empty shell
I don't know if you have any recommendation on patterns to follow 🙂
r
Hello ! I don’t know if that’s best practice or not but I’ve designed a small config library to load .toml configs to allow easily to have multiple deployments for using the same flows. For example here what the config of one deployment looks like :
Copy code
[[flows]]
name="facebook-comments-last-3-days-daily"
cron="30 4 * * *"
parameters={ "time_interval"="last_3_days"}
module="flows.facebook.fb_comments"
flow_function="facebook_get_comments_flow"
tags=["facebook", "comments"]
description="Get comments from Facebook active publications (less than 3 days old) every day"
This has been working fine for us so far 🙂
a
Interesting, thanks! The library loads the TOML file and create the deployments using Prefect's Python API, right?
r
Exactly, we run a python script in our ci (or locally in our docker compose stack) that parse configs and create deployments
a
I'm actually doing the same kind of things but using the
prefect.yaml
directly and
prefect --no-prompt deploy --all
The
prefect.yaml
looks like this:
Copy code
name: metabase-reports
prefect-version: 2.13.4

pull:
- prefect.deployments.steps.set_working_directory:
    directory: "/usr/lib/data-platform/prefect/"

deployments:
- name: integrate-crm
  version: "{{ $VERSION }}"
  description: |
    Integrate CRM to the Staging layer of the Lakehouse
  flow_name: null
  entrypoint: ./flows/integrate_crm.py:integrate_crm
  parameters: {}
  work_pool: 
    name: default
    job_variables:
      image: "{{ $DOCKER_REGISTRY }}/{{ $DOCKER_REPOSITORY }}:{{ $DOCKER_TAG }}"
r
I see ! That’s pretty great I think I haven’t read enough about prefect.yaml. We went the custom way because we wanted to have a structure with separate files like : configs | facebook comments.toml publication_metrics.toml | twitter tweet_metrics.toml ... Do you think the yaml can be splitted into this kind of hierarchy ?
a
I don't think the YAML could be split, but you can have multiple
prefect.yaml
. But you'll loose the possibility to run a single command to deploy everything... So I don't think we can do what you want with YAML files only 😓
1
@Robin Niel: I saw in your TOML file that you provide a Python module. I didn't know it was possible (I thought you have to specify a file path). How are you using it within your internal tool?
r
I am using importlib :
Copy code
def get_deployment(self, work_pool_name, is_schedule_active):
        source_module = importlib.import_module(self.__module)
        flow_function = getattr(source_module, self.__flow_function)
        return flow_function.to_deployment(
            name=self.__name,
            tags=self.__tags,
            parameters=self.__parameters,
            cron=self.__cron,
            description=self.__description,
            work_pool_name=work_pool_name,
            is_schedule_active=is_schedule_active,
        )
a
And how do you package your module, then? Is everything embedded into a Docker image, for instance?
r
Yes we run everything in containers, the image contains the flows and the script to deploy
a
Thanks! I'm doing exactly the same thing, that's why I was curious 🙂 And one last question (which is broader, actually): how do you manage tags? I mean how do you decide which tag to use on a particular deployment?
I put the name of the "module" (like source, export, etc.) I'm using as a tag, but... It's actually pretty useless 😅
r
Maybe it’s easier for us because we get data mainly from social networks API so we have tags for facebook, twitter etc.. and then we have “data” type like comments, publication, metrics .. It makes it pretty simple to know how does comment collect for all social networks went for the day or all collect for a specific network