https://prefect.io logo
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
show-us-what-you-got
  • m

    matta

    02/18/2021, 7:31 PM
    and then added these two lines to the end of the flow:
    filepaths = get_filepaths(dbt_path, upstream_tasks=[dbt_run])
    publish_artifact(filepaths)
  • m

    matta

    02/18/2021, 7:34 PM
    Would it make sense to maybe make a PR based on this for the dbt Tasks?
    d
    • 2
    • 8
  • c

    Craig Wright

    02/18/2021, 8:07 PM
    Hi all! Yesterday we released a Task,
    FivetranSyncTask
    , that lets you kick off a sync with Fivetran and will tell you when that sync has completed (or failed). The goal is to let the user be more deterministic about when to start processing or transforming data that has been landed in a data warehouse by Fivetran. For example, even though Fivetran's UI lets you setup a sync and also setup dbt transformations, we currently don't have the functionality to have the dbt transfromation run after a sync has completed. You could do this now with Prefect and the FivetranSyncTask. This is our very first effort to start bridging the gap between our automated pipelines and the more robust scheduling needs of a Prefect user. It is very early days, we would love feedback! Please reach out to myself and/or @Nick Acosta. We would love to talk to folks who would be interested this, especially to know where this does or does not meet your needs. Thanks!
    👍 4
    🚀 5
  • s

    S K

    02/18/2021, 11:05 PM
    Please help me how to fix this issue. This is all setup in aws ec2. Prefect is up and running the Python flow every 15 minutes and after 4 hours or so the flow is stopping, UI not accessible. Then I go to ec2 CLI and do "prefect server stop" and I get the below error OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 4: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 63450 current, 63450 max
    m
    • 2
    • 4
  • c

    CA Lee

    02/21/2021, 5:31 AM
    Hello @Anna Geller (old account) @Jimmy Le I’ve been following your tutorials on running serverless flows using AWS EKS (Fargate), great content and thanks for sharing with us! Anna: https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30 Jimmy: https://lejimmy.com/distributed-data-pipelines-with-aws-ecs-fargate-and-prefect-cloud/ I am just a hobbyist Prefect user, and have been using a USD5 / month Ubuntu 20.04 instance to run my flows - scraping data, changing the shape using Pandas, communicating with databases - on a schedule, pretty standard stuff. Deps, config and my own code have been installed onto the instance as a package. Tried out your tutorials out of curiosity and got my flows to work, but couldn’t wrap my head around 2 issues: • Cost - even keeping the Fargate agent running as a serverless EKS cluster, at USD0.10 / hour, would cost about USD72 / month. Is the right way to keep the agent on all the time? • Image size and transfer between runs - I’m using a base image which consists of some pip deps and my own custom package. On flow registration, Prefect creates a new container to copy the flow code in, so if my base image layer is say 500MB, my flow code layer is say 50kb, each serverless run of a flow would still cost 500.050 MB to run as the image is not persistent between flow runs. Each additional flow registered costs the entire size of the base image, plus an additional few kbs of flow code, which seems a bit extravagant While I can appreciate the distributed, fault-tolerant, highly available, self-healing & automatically scalable properties of serverless flow runs, does anyone have any insight on how we can reduce / manage the costs to do so?
    j
    b
    a
    • 4
    • 19
  • c

    CA Lee

    02/27/2021, 12:14 PM
    Thanks for your responses - I made a viz / summary / cheatsheet following the above guidance for distributed workflows. Hope it is of some help to other people who are interested Prefect + ECS / EKS + S3 / ECR
    👍 9
    🚀 9
  • s

    Slackbot

    02/27/2021, 7:24 PM
    This message was deleted.
    d
    • 2
    • 1
  • s

    Slackbot

    03/05/2021, 7:18 PM
    This message was deleted.
    👍 2
    j
    • 2
    • 1
  • s

    Slackbot

    03/17/2021, 10:18 PM
    This message was deleted.
    j
    • 2
    • 1
  • m

    matta

    03/26/2021, 2:44 AM
    Ooh, just found that the
    toolz
    library works pretty seamlessly with Prefect maps. So you can use
    pipe
    ,
    compose
    ,
    thread_first
    or
    thread_last
    to chain operations. https://toolz.readthedocs.io/en/latest/api.html#toolz.functoolz.pipe So like, from the Horizontal Mapping blog post, this:
    with Flow("mapping-test") as flow:
      sleep.map(sleep.map(times))
    cam become this:
    with Flow("mapping-test") as flow:
        tz.pipe(times, 
                sleep.map, 
                sleep.map)
    which I find a bit more readable! Or lets you turn:
    with Flow('iterated map') as flow:
        mapped_result = add_ten.map([1, 2, 3])
        mapped_result_2 = add_ten.map(mapped_result)
    into
    with Flow('asdasds') as flow:
        mapped_result = tz.pipe([1, 2, 3],
                               add_ten.map,
                               add_ten.map)
    👍 4
    🙌 5
    🙌🏿 1
    👍🏿 1
    👀 10
    • 1
    • 4
  • j

    Jacopo Tagliabue

    04/02/2021, 4:43 PM
    Ciao! I'm Jacopo from Coveo! I love prefect and also Metaflow - and I thought to hack together a simple shellTask allowing a Metaflow Flow to live as a task inside prefect (for us, metaflow is typically after dbt and great expectations, so it's a dag inside a dag). I share here the code and 3 mins video of me showing the flow working with the local agent -> Open repo (WIP - don't read too much into it)
    <https://github.com/jacopotagliabue/metaflow-as-prefect-task>
    Screenshot Overview Here
    <https://drive.google.com/file/d/1XoG8UfPpiCSuXYvp9Y27zm6_E4D3C9NX/view>
    Open to feedback, especially if you think it's a broad enough use case to deserve a much better polished task class (which can totally be done and I'm happy to work together with others on it!)
    ❤️ 6
    n
    d
    • 3
    • 5
  • a

    Aaron Richter

    04/12/2021, 12:23 AM
    Hi everyone! Wanted to share this Prefect flow I wrote to check for COVID-19 vaccine appointments. I found myself checking vaccinespotter.org constantly and that the availability was changing super fast, so I thought “wait I can automate this!” Prefect made it super easy for me to get it on a schedule and setup email notifications. I had it checking very 10 minutes and I was able to find a slot within a day of having it running ⚡
    🦠 7
    :cool-llama: 9
    👍 12
    :marvin: 8
    k
    m
    +2
    • 5
    • 4
  • a

    Andrew Moist

    04/13/2021, 11:18 AM
    Hi everyone. Some really great content in this channel. I'm new to Prefect and was wondering if there are any open source code bases for real Prefect projects? The above project is nice - but are there larger ones that demonstrate using more Prefect features and best practices?
    k
    m
    • 3
    • 4
  • a

    ale

    04/16/2021, 12:05 PM
    Hey folks! At Cloud Academy we are true believers of Prefect & dbt so we decided to combine them to build an analytics solution that is now used by our enterprise customers! Here’s the article https://cloudacademy.com/blog/data-engineering-business-intelligence-an-exciting-quest/
    👏 14
    k
    • 2
    • 1
  • m

    Maikel Penz

    04/27/2021, 9:18 PM
    Hey people ! I created a Github repository that automates the AWS infrastructure deployment and Agent spin through Github Actions. I’ve also put a GH action in place to register workflows from other repositories. Check my article to learn more about it. https://towardsdatascience.com/introducing-a-dataflow-management-system-backed-up-by-prefect-aws-and-github-actions-3f8c0eef2eb2
    🚀 24
    👏 5
    :marvin: 13
    f
    • 2
    • 3
  • f

    flavienbwk

    04/29/2021, 7:57 PM
    Hi community ! My school launched a hackathon this week which aimed to make students experiment big data technologies. After 4 days of development, I'm happy to present you the work my team and I achieved. Working with Prefect to inject COVID-related data and news, it was particularly useful for scheduling daily reports from various sources. We've created a GitHub repo presenting our project "Pandemic Knowledge" : https://github.com/flavienbwk/Pandemic-Knowledge I'd be happy to have feedbacks as well to improve our flows way of coding. Here is a preview of what we got in the end :
    :marvin: 14
    🚀 10
    👏🏼 1
    👀 10
    👏 27
    🙌 3
    j
    k
    +3
    • 6
    • 8
  • d

    Dylan

    05/20/2021, 5:36 PM
    https://prefect-community.slack.com/archives/CL09KU1K7/p1621531419215900
    :marvin: 2
    :upvote: 4
  • c

    Chris White

    05/20/2021, 7:32 PM
    @Joe Schmid’s talks are always great and informative — highly recommend!!
    🙏 2
  • j

    Josiah Berkebile

    05/21/2021, 3:27 PM
    I kinda just couldn't resist throwing this meme in here:
    💯 7
    😂 6
    d
    j
    • 3
    • 2
  • n

    Nelson Griffiths

    05/21/2021, 9:02 PM
    Hey Everyone! I don't know how many other people use Prefect for both personal projects and at work. But I put together a small python package that helps make tracking different accounts and switching between them easier. It also has some basic functionality to track different configuration files if you use different settings for different accounts or tokens. Feel free to check it out! https://pypi.org/project/prefect-cloud-manager/0.1.1/
    :marvin: 3
    👍 8
    k
    • 2
    • 1
  • k

    Kevin Kho

    05/24/2021, 4:41 PM
    Hello everyone, sharing our event with Heartex about using Prefect to preprocess images before using LabelStudio
    👏 1
    :marvin: 7
    🙌 1
    🚀 7
  • j

    Jason Prado

    05/31/2021, 10:59 PM
    I don’t have much to show publicly but just wanna say our rideshare cooperative startup is running a ton of jobs on Prefect and it’s going great 😄 https://www.nytimes.com/2021/05/28/technology/nyc-uber-lyft-the-drivers-cooperative.html
    🚀 12
    👍 8
  • k

    Kevin Kho

    06/23/2021, 3:32 PM
    Hey everyone, we have an event with Coiled next week where our where I will demo the new KV Store, and Jeremiah will give general updates about the company.
    👀 3
    👋 2
    :upvote: 10
    🚀 14
    j
    r
    f
    • 4
    • 8
  • m

    matta

    06/25/2021, 9:39 PM
    this GitHub Action will install conda on the runner VM, install Prefect, then register a Flow (using the new
    git
    Storage to point it to code on a private repo):
    # This is a basic workflow to help you get started with Actions
    
    name: CI
    
    # Controls when the action will run. 
    on:
      # Triggers the workflow on push or pull request events but only for the main branch
      push:
        branches: [ main ]
      pull_request:
        branches: [ main ]
    
      # Allows you to run this workflow manually from the Actions tab
      workflow_dispatch:
    
    # A workflow run is made up of one or more jobs that can run sequentially or in parallel
    jobs:
      # This workflow contains a single job called "build"
      build:
        # The type of runner that the job will run on
        runs-on: ubuntu-latest
    
        # Steps represent a sequence of tasks that will be executed as part of the job
        steps:
          - uses: actions/checkout@v2
          - uses: conda-incubator/setup-miniconda@v2
            with:
              auto-update-conda: true
              python-version: 3.8
          - name: install prefect
            shell: bash -l {0}
            run: conda install -c conda-forge prefect -y
          - name: login to Prefect Cloud
            shell: bash -l {0}
            run: prefect auth login -t ${{secrets.PREFECT_TOKEN}}
          - name: Register flow
            shell: bash -l {0}
            run: prefect register --project tester -p test.py -n "run-cloud-fn"
    😎 3
    • 1
    • 1
  • m

    matta

    06/28/2021, 11:57 PM
    Handy Task that triggers a Google Cloud Function (assumes you have your GCP credentials saved as a
    Secret
    in Prefect)
    import prefect
    from prefect import task, Flow, Parameter
    from prefect.tasks.secrets.base import PrefectSecret
    from google.oauth2 import service_account
    from google.auth.transport.requests import AuthorizedSession
    
    @task
    def trigger_cloud_fn(
        secret: PrefectSecret, url: str, body: str
    ):
        credentials = service_account.IDTokenCredentials.from_service_account_info(
            secret, target_audience=url
        )
        authed_session = AuthorizedSession(credentials)
        response = <http://authed_session.post|authed_session.post>(url=url, json=body)
        return response
    👏 8
  • k

    Kevin Kho

    07/02/2021, 2:08 PM
    Cross-posting so this doesn’t get lost
    🏀 4
    🚀 13
  • k

    Kevin Kho

    07/21/2021, 4:29 PM
    Check out this blog from our partner Slate
    :upvote: 14
  • m

    matta

    08/03/2021, 4:21 AM
    Made a new version of the Task that triggers Google Cloud Functions - this one will actually fail if the Cloud Function doesn't work (my last version failed silently)
    @task(max_retries=3, retry_delay=datetime.timedelta(seconds=30))
    def trigger_cloud_fn(secret: PrefectSecret, url: str, body: t.Dict = dict()):
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>(body)
        credentials = service_account.IDTokenCredentials.from_service_account_info(
            secret, target_audience=url
        )
        authed_session = AuthorizedSession(credentials)
        response = <http://authed_session.post|authed_session.post>(url=url, json=body)
        if not (isinstance(response, requests.models.Response) and response.ok):
            raise signals.FAIL()
        authed_session.close()
        return response
    ❤️ 9
    k
    m
    • 3
    • 4
  • g

    Gleb Mezhanskiy (Datafold)

    08/19/2021, 4:17 PM
    I was curious if one can put together a 100% open-source equivalent to what is considered Modern Data Stack and wrote a blog to share my findings. Very curious to hear your thoughts and if I missed any great products (of course I didn’t miss 😛refect:!
    🚀 7
    c
    • 2
    • 1
  • g

    Gareth Dwyer

    08/20/2021, 10:56 AM
    Hey all, we recently announced the first release of Open MLOps - an open source production machine learning framework that uses Prefect. • GitHub: https://github.com/datarevenue-berlin/OpenMLOps • Blog announcement: https://datarevenue.com/en-blog/open-mlops-open-source-production-machine-learning TL;DR - it is a set of terraform scripts that sets up Prefect and a bunch of other tools on Kubernetes (currently focusing on EKS / AWS, but should work on others too) as well a set of basic guides and tutorials to get it up and running and build and deploy a basic ML model. Let me know if you try it out 🙂
    👋 8
    👀 4
    🤩 1
    :cool-llama: 6
    d
    c
    r
    • 4
    • 3
Powered by Linen
Title
g

Gareth Dwyer

08/20/2021, 10:56 AM
Hey all, we recently announced the first release of Open MLOps - an open source production machine learning framework that uses Prefect. • GitHub: https://github.com/datarevenue-berlin/OpenMLOps • Blog announcement: https://datarevenue.com/en-blog/open-mlops-open-source-production-machine-learning TL;DR - it is a set of terraform scripts that sets up Prefect and a bunch of other tools on Kubernetes (currently focusing on EKS / AWS, but should work on others too) as well a set of basic guides and tutorials to get it up and running and build and deploy a basic ML model. Let me know if you try it out 🙂
👋 8
👀 4
🤩 1
:cool-llama: 6
d

davzucky

08/20/2021, 11:46 AM
That great. Thank you for sharing. We are building the same stack on our side with Marquez as well for data lineage even it is not native in perfect today
c

Cooper Marcus

08/25/2021, 2:56 PM
@Berty Pribilovics nice?
r

Richard Pelgrim

09/27/2021, 2:01 PM
Thanks for sharing!
View count: 3