I am working on standing up Prefect 2.0 is a production environment. For internal data pipeline and ...

Alexander Butler

04/14/2022, 1:57 AM

I am working on standing up Prefect 2.0 is a production environment. For internal data pipeline and reverse etl uses so no fire hazards on my end to use 2.0 early here. Is there a general preference on YAML vs Code for the deployment specification. I noticed you can configure a flow deployment with YAML but I cant find any information on the schema of that document. For example:

Copy code

- name: elt-salesforce
  flow_location: ./salesforce_flows.py
  flow_name: elt-salesforce
  tags:
    - salesforce
    - core
  parameters:
    destination: "gcp"
  schedule:
    interval: 3600

Assuming interval is seconds? Can I specify another grain? Can schedule take a dict? If it takes cron, does that take a dict? Honestly schedule is the primary question point. Everything else is straightforward enough.

discourse 1

Kevin Kho

04/14/2022, 2:33 AM

Hi @Alexander Butler, I’d need to check with the team tomorrow about this and get back to you.

🙏 1

Anna Geller

04/14/2022, 9:43 AM

Good choice starting with 2.0 directly! 👏 I'm more biased towards definition in Python, but YAML is also supported. Python definition is friendlier and cleaner. Here is one example of using YAML:

Copy code

- name: crypto_prices_etl_dev
  flow_location: /Users/anna/repos/gitops-orion-flows/flows/crypto_prices_etl.py
  flow_name: crypto_prices_etl
  tags:
    - dev
  schedule:
    interval: 3600
- name: repo_trending_check_prefect_dev
  flow_location: ./flows/repo_trending_check.py 
  flow_name: repo_trending_check
  tags:
    - dev
  parameters:
    repo: "prefect"
  schedule:
    interval: 3600
- name: repo_trending_check_orion_dev
  flow_location: /Users/anna/repos/gitops-orion-flows/flows/repo_trending_check.py
  flow_name: repo_trending_check
  tags:
    - dev
  parameters:
    repo: "prefect"
  schedule:
    interval: 60

I believe the same definition via Python

DeploymentSpec

is much cleaner and easier to understand/change, but YAML is also fine 🙂

Alexander Butler

04/14/2022, 2:51 PM

I like Python too. I think the ambiguous bit is whether

schedule

supports cron or different kwargs for interval?

Alexander Butler

04/14/2022, 2:51 PM

or a different time grain

Alexander Butler

04/14/2022, 2:52 PM

in yaml

Zanie

04/14/2022, 3:12 PM

The YAML is loaded using Pydantic models which infers the type based on the keys

Zanie

04/14/2022, 3:13 PM

So if you did

cron: string-here

instead of

interval: integer

it’d be loaded as a cron schedule

Zanie

04/14/2022, 3:16 PM

From the Pydantic documentation, you can provide more rich strings for intervals other than seconds

Zanie

04/14/2022, 3:16 PM

Copy code

timedelta fields can be:
        timedelta, existing timedelta object
        int or float, assumed as seconds

        str, following formats work:
            [-][DD ][HH:MM]SS[.ffffff]
            [±]P[DD]DT[HH]H[MM]M[SS]S (ISO 8601 format for timedelta)

Zanie

04/14/2022, 3:16 PM

See https://pydantic-docs.helpmanual.io/usage/types/#datetime-types

Zanie

04/14/2022, 3:17 PM

cc @terrence this is a good nugget 🙂

👍 1

terrence

04/14/2022, 3:20 PM

Good note

Anna Geller

04/14/2022, 4:02 PM

https://discourse.prefect.io/t/how-to-create-deployments-from-yaml/758

🚀 1

4 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.