https://prefect.io logo
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
show-us-what-you-got
  • a

    Adisun Wheelock

    04/22/2020, 4:32 PM
    Hi everyone, I have a question about running
    async
    tasks.
    from prefect import task, Flow
    
    @task
    async def extract():
        return [1,2,3]
    
    @task
    async def transform(some_list):
        transformed = [x + 1 for x in some_list]
        return transformed
    
    @task
    async def load(transformed):
        print('load somewhere')
        
    with Flow('async testing') as flow:
        extract_nums = extract()
        transformed = transform(extract_nums)
        load(transformed)
    
    flow.run()
    This creates a flow and all, but how do I actually run this flow asynchronously?
    k
    • 2
    • 1
  • p

    Philip Blankenau

    04/22/2020, 11:42 PM
    I want to use Prefect to stitch together a number of python functions and calls to executables for a satellite image processing workflow. This will all be done locally on a single machine. Some tasks will depend on files written by previous tasks. Will Prefect work for this use case? In the examples, it looks like data is passed in memory among the tasks and there aren't dependencies based on the existence of certain files.
    j
    • 2
    • 10
  • v

    Vitor Avancini

    04/23/2020, 1:20 PM
    Hello people, I`ve followed the instruction for running the UI the simplest way possible and registering a scheduled flow. It has scheduled but it never actually run anything. I have two late flow execetution right now, any ideas on that?
    n
    l
    • 3
    • 15
  • h

    Hugh Cameron

    04/28/2020, 11:33 AM
    Hey there - enjoyed the PyData Denver talk - using agents clicked for me! I’ve set up a Docker agent but my flows aren’t executing. Here’s my agent output:
    ~# prefect diagnostics
    {
      "config_overrides": {},
      "env_vars": [
        "PREFECT__SERVER__UI__GRAPHQL_URL"
      ],
      "system_information": {
        "platform": "Linux-3.10.105-x86_64-with-glibc2.2.5",
        "prefect_version": "0.10.4",
        "python_version": "3.8.1"
      }
    }
    ~# prefect agent start docker --network prefect-server --label NAS --label Docker
    
     ____            __           _        _                    _
    |  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
    | |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
    |  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
    |_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                               |___/
    
    [2020-04-28 11:29:17,055] INFO - agent | Starting DockerAgent with labels ['NAS', 'Docker']
    [2020-04-28 11:29:17,056] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
    [2020-04-28 11:29:17,056] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>
    [2020-04-28 11:29:17,078] INFO - agent | Waiting for flow runs...
    Any tips to troubleshoot?
    j
    • 2
    • 1
  • j

    Jeff Brainerd

    04/29/2020, 12:05 PM
    SED podcast featuring Prefect: https://softwareengineeringdaily.com/2020/04/29/prefect-dataflow-scheduler-with-jeremiah-lowin/ 🙌
    😍 4
    🚀 2
    • 1
    • 1
  • d

    David Ojeda

    04/29/2020, 1:47 PM
    Hi! What resources do you guys at prefect would recommend to jump into the GraphQL train? I have more experience in rest-based services and I wish I could be more independent when interacting with a GraphQL API such as the one by Prefect…
    n
    • 2
    • 3
  • m

    Matias Godoy

    04/29/2020, 2:23 PM
    Is there a way to install a "lite" version of the python Prefect Client without having to install the whole Core package? I need it to interact with my Prefect server from another app and a think that adding the whole
    prefect
    python package might be a little overkill.
    t
    n
    • 3
    • 13
  • y

    Yufei

    05/01/2020, 8:29 PM

    https://www.youtube.com/watch?v=q8BgLPq_9Zw▾

    ❤️ 2
    👍 6
  • y

    Yufei

    05/01/2020, 8:29 PM
    another live streaming is on now. Just FYI
  • j

    Josep Consuegra Navarrina

    05/11/2020, 3:01 PM
    Hi team! First message here 💪 I am having some issues with the prefect local server UI. I am registering flows in order to run them from the UI, but the given url sends me to a blank page, with no trace of the flow.
    k
    n
    d
    • 4
    • 31
  • r

    Roy Trostyanetski

    05/12/2020, 3:54 PM
    Hi I am trying to create a big system a part of it is that I want the system to be able to work with python scriptable plugins what I want is for clients of my system to be able to submit their git repository containing their python script And then I want to be able to schedule each of those scripts to run either periodically or manually I also don't want a case where the execution of one script hurts the execution of another script system wise and resource wise Is there a way to accomplish this with the open source perfect.io lib? I know its a big question but I would be really greatful if you could help me Also if you have any suggestions for what is the best architecture for a system like this I will be more than happy to hear from you😁
    a
    z
    • 3
    • 9
  • m

    Mark Baker

    05/18/2020, 6:14 PM
    I am trying to invite a user but am having difficulty. It seems that our Barracuda email security appliance rewrites the URLs in links to go through a protection layer and it is getting in the way. If I try to create the user, it says the user already exists. If I try to reset the password, a password reset email never comes through. Tried to use the support feature on the website, but at least in Chrome it does not seem to work.
    z
    • 2
    • 2
  • r

    Roy Segall

    05/21/2020, 4:49 AM
    Hi, I've wanted to tried this thing and installed via docker but I understood I'm not the customer of this product. Now, Everytime I restart my mac the prefecthq working. How can I uninstall it?
    l
    • 2
    • 1
  • r

    Roy Trostyanetski

    05/25/2020, 6:18 AM
    Hi, Is there a way to install Prefect Core with the UI on something like AWS and add authentication on top of it so that only certain people can access it? And is it possible without using Prefect Cloud?
    n
    d
    +2
    • 5
    • 7
  • t

    Tuan Nguyen

    06/12/2020, 8:12 AM
    Been a long time since I last saw a team wrote about data. Thought I’d share them here with you guys: https://www.producthunt.com/posts/the-analytics-setup-guidebook
    💯 3
    👀 1
  • a

    Aamir Butt

    06/19/2020, 10:43 AM
    Hi All, i’m trying to understand the use cases for prefect. For example is it suitable for internal business workflows? For example something as simple as when a user registers, requiring them to confirm their emails, getting them to fill-in the user profiles, et cetera. Or implementing the process for holiday submission and authorisation for staff... ie Standard business SOPs.
    n
    • 2
    • 2
  • p

    Philip MacMenamin

    06/19/2020, 5:39 PM
    Hey, If you needed to create a flow that did some logic in Python, and shell out a number of tasks and wait for them to return, then do more logic in python and ultimately create a set of final outputs which would be persisted, what would be the Prefect-ish way of doing this WRT communicating the locations of the files for each task? Is there a way to give each flow it's own dir in /tmp for example, and then refer to the files relatively, with each flow run looking in it's own /tmd/dir? Or do you use result caching and pass around the file locations that way? I'm probably missing something obvious. If anybody can point out an example of a flow which has this kind of logic it would be really helpful!
    n
    • 2
    • 25
  • p

    Philip MacMenamin

    06/23/2020, 8:50 PM
    Is there anywhere I can get an example workflow, or get pointed to an example workflow which looks something like the following: • starts flow and creates a unique local directory for that run on disk • executes a task which creates a file - eg downloads a file from some URL using
    urllib
    , and saves it in
    $RUN_LOC/my_file
    • execs a task which runs a bash command - eg
    wc -l $RUN_LOC/my_file > file_len
    Obviously the tasks here are silly. My aim is to have an example of something which I can have a series of tasks do work on files. The task results are almost not important, the result might be nothing more than a RAN_OK/ NOT_OK. What matters to me is I can have a mechanism to operate on files, and shell out to other utils to operate on files and check their return status. I've been looking at https://docs.prefect.io/core/concepts/results.html#how-to-configure-task-result-persistence
    from prefect import task, Flow
    from prefect.engine.results import LocalResult
    
    
    @task(result=LocalResult(location="initial_data.prefect"))
    def root_task():
        return [1, 2, 3]
    
    @task(result=LocalResult(location="{date:%A}/{task_name}.prefect"))
    def downstream_task(x):
        return [i * 10 for i in x]
    
    with Flow("local-results") as flow:
        downstream_task(root_task)
    z
    • 2
    • 27
  • i

    itay livni

    06/24/2020, 1:28 AM
    Hi - I am being lazy about re-configuring another docker build push interface. Is it possible to use
    prefect.environments.storage.docker
    as the mechanism to build and push containers while using Non-Docker Storage for Containerized Environments? (I ran into a bug deploying this, and wondering if this might be a reason)
    flow.storage = S3(
        bucket="s3-prefect-flow-storage",
        secrets=["AWS_CREDENTIALS"],
        )
    docker = Docker(
        registry_url=ecr_repo_url,
        python_dependencies=[
            "pandas",...],
        dockerfile=docker_flpth,
        image_name="annoying_docker",
        image_tag="latest",
        local_image=True
        )
    docker.build(push=True)
    j
    • 2
    • 13
  • p

    Peter B

    06/28/2020, 2:51 AM
    Hey all - I'm using Prefect for a side project to learn it and had a question. I've got a workflow that handles several thousand JSON-serializable objects that I need to put on S3, then process in a later step. Originally I was using a mapped
    S3Upload/S3Download
    task for uploading and downloading JSON at each stage. Then I realized that beyond the file size, S3 charges also happen per request (which I was making to check which files had been uploaded at each stage, then downloading files that were not yet processed). So rather than make potentially several thousand download requests to S3, I came up with a task that compresses a bunch of JSON serializable objects and uploads that compressed file to S3 instead. So it prevents the mapped
    S3Upload/S3Download
    and a lot of requests (and saves a lot of $$$). I'm happy with the workflow so no real problem at hand, but a few questions just out of curiosity: 1. Is there anything anyone has done similar to this? 2. Is there a Prefect task I'm missing that would handle this better? 3. Anyone have a better way of doing this? Here's the task (which gets passed to an
    S3Upload
    task next in the flow)
    @task
    def compress_json_serializable_objects(
        json_serializable_objects: List[Dict[str, Any]],
        object_names: List[str],
        compression="xz",
    ):
        if len(json_serializable_objects) != len(object_names):
            raise ValueError(
                f"json_serializable_objects (len={len(json_serializable_objects)})"
                f"and object_names (len={len(object_names)}) not Equal"
            )
        with NamedTemporaryFile() as tmp:
            with tarfile.open(tmp.name, f"w:{compression}") as tf:
                for obj, name in zip(json_serializable_objects, object_names):
                    with closing(BytesIO(json.dumps(obj).encode())) as fobj:
                        tarinfo = tarfile.TarInfo(name)
                        tarinfo.size = len(fobj.getvalue())
                        tf.addfile(tarinfo, fileobj=fobj)
            upload_data = Path(tmp.name).read_bytes()
        return upload_data
    a
    • 2
    • 1
  • a

    An Hoang

    07/07/2020, 2:55 PM
    Anyone used Prefect to do federated online learning? Having a centralized model, do prediction on individual devices without seeing the data, get feedback from the error of those predictions and then send back something like the gradient to update the centralized model? Any resources to read more about this issue in particular?
    👂 1
    j
    • 2
    • 4
  • j

    james.lamb

    07/20/2020, 3:19 PM
    I've been trying to figure out how to get a
    flow_group_id
    from Prefect Cloud, given a project name and flow name. As far as I understand from this thread, the combination of project name, flow name, and the tenant I'm auth'd as should be enough to uniquely identify a
    flow_group
    . This is the first time I've ever used GraphQL so if anyone has done this or has a better recommendation, I'd welcome it! This was my solution:
    from prefect.client import Client
    
    def get_flow_group_id(flow_name, project_name) -> str:
        """
        Get the `flow_group_id` for a flow with a given
        name, from a given Prefect Cloud project.
        """
        client = Client()
        query = """
            query {
              flow(
                where: {
                    name: { _eq: "%s" }
                }
              ) {
                id
                name
                flow_group_id
                project_id
              }
              project(
                  where: {
                    name: { _eq: "%s" }
                  }
              ) {
                  id
                  name
              }
            }
        """ % (flow_name, project_name)
        result = client.graphql(query)
        project_id = result["data"]["project"][0]["id"]
        flow_group_id = [
            flow for flow in
            result["data"]["flow"]
            if flow["project_id"] == project_id
        ][0]["flow_group_id"]
        return flow_group_id
    n
    j
    • 3
    • 8
  • m

    mithalee mohapatra

    07/21/2020, 7:15 PM
    Hi- I am trying to work on the same version of the flow I have uploaded to my S3 bucket. The issue I found is storage.flows() is empty and does not find my existing flow from S3 bucket. If I explicitly pass the flow name then I can access my flow through get_flow method.Else I get an error as "Flow not contained in the storage". Please let me know if I am missing anything. #define globally the flow name and flow location as storage.flows() is not able to find my flow from S3. dictFlows={'flows': {'ETL': 'etl/testflow'}, 'flow_location': 'etl/testflow'} def test_add_flow_to_S3(): storage = S3(bucket="test",key="etl/testflow") f = Flow("ETL") f.name not in storage with Flow("ETL") as f: e = extract() t = transform(e) l = load(t) flow_location=storage.add_flow(f) f.name in storage storage.build() def test_get_flow_S3(dictFlows): print("i am in get flow") storage = S3(bucket="test", key="etl/testflow") storage.flows=dictFlows['flows'] newflow=storage.get_flow('etl/testflow') print("S3 FLOW OUTPUT") newflow.run()
    :upvote: 1
    z
    • 2
    • 10
  • m

    Ming Fang

    07/27/2020, 1:11 PM
    I like to share my set of Terraform modules to run Prefect Server inside Kubernetes https://github.com/mingfang/terraform-k8s-modules/tree/master/examples/prefect
    :prefect: 2
    🎉 12
    a
    • 2
    • 1
  • a

    Anna Geller (old account)

    08/25/2020, 8:29 PM
    I created a tutorial on how to set up a serverless Kubernetes cluster on AWS as your Prefect agent - let me know if you have any questions. I hope it may be useful to somebody: https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30 It was inspired by the amazing YouTube tutorials from @Laura Lorenz - thanks a lot for them, Laura! looking forward to the next live streams!
    🚀 17
    💯 26
    ❤️ 6
    🤩 12
    👏 6
    l
    j
    j
    • 4
    • 3
  • a

    Anna Geller (old account)

    09/04/2020, 8:10 PM
    This time, I wrote a blog post on how to manage dependencies between data pipelines. I mainly use here
    FlowRunTask
    to trigger single flows in order. This helped my team to organize the flows between several layers of jobs in our data warehouse: staging area, business logic, data mart. I also compare how this can be done in Airflow vs. in Prefect and Prefect version proved to be much simpler. I hope this may be useful to some of you. Greetings from Berlin! 🙂 https://towardsdatascience.com/managing-dependencies-between-data-pipelines-in-apache-airflow-prefect-f4eba65886df
    ❤️ 4
    😍 2
    👋 6
    🚀 14
    👏 12
    h
    • 2
    • 3
  • a

    Avi A

    09/09/2020, 8:58 AM
    Hello! In our work we use Jupyter notebooks a lot, especially for generating reports at the end of a task (e.g. analyzing classifier performance across categories with confusion matrices and whatnot). So I wrote a Prefect
    JupyterTask
    for internal use, and was wondering if anyone would be interested in such a thing before I make the extra effort of publishing it in a package
    e
    a
    +3
    • 6
    • 10
  • m

    Michael Ludwig

    09/16/2020, 8:53 AM
    Hey all. I wrote together a bit about Prefect and Prefect cloud in a blog post with an End-to-End example where I highlight how you get started and get a feel for Prefect. Maybe that is interesting for you 🙂 https://www.linkedin.com/posts/netlight-consulting_prefecta-modern-python-native-data-workflow-activity-6711916450665287680-Ui4f Direct-Link: https://makeitnew.io/prefect-a-modern-python-native-data-workflow-engine-7ece02ceb396
    ❤️ 9
    🚀 7
    👍 10
    j
    g
    • 3
    • 2
  • j

    james.lamb

    09/16/2020, 10:18 PM
    👋 hello from Chicago! At Saturn Cloud (https://www.saturncloud.io/s/), we offer a managed solution for data scientists and data engineers to easily provision and manage Dask clusters, so they can speed up their Python workflows. We've recently added an integration with Prefect Cloud, and I'm really excited to share it here! 🎉 The workflow we've designed is like this: • Provision a
    KubernetesAgent
    with a few clicks in the Saturn UI • Author your flow in a Saturn-managed instance of Jupyter Lab • Register the flow with Prefect Cloud. A library called
    prefect-saturn
    (https://github.com/saturncloud/prefect-saturn) adds an environment and storage to your flow that says "hey, run this on one of my Saturn Dask clusters" • Whenever your flow runs, it creates a Dask cluster for itself programmatically. All your dependencies, files, and necessary credentials will be available on all of the Dask workers, and all logs and task statuses get sent back to Prefect Cloud. More details about the architecture and links to sample code are available at https://www.saturncloud.io/docs/connecting/tools/prefect-cloud/. I want to thank the
    prefect
    team for the care and attention they've given to keeping
    prefect
    modular. We were able to write a client library like this (and another thin wrapper around
    KubernetesAgent
    ) without needing to manage a lot of code, because of the clear separation of concerns in the
    prefect
    library. The maintainers have also been very responsive to my questions in this Slack and to my proposals in issues and pull requests. ❤️
    🙏 4
    🎉 14
    🤘 1
    :marvin: 6
    👏 2
  • j

    Jackson Maxfield Brown

    09/18/2020, 7:07 PM
    Hey all! I wanted to share a really weird (but neat imo) Prefect workflow and task implementation that we tried just to potentially spark some ideas and discussions. Introducing ACTK (Automated Cell Toolkit) a pipeline to process field-of-view (FOV) microscopy images and generate features and render-ready products for the cells in each field. https://github.com/AllenCellModeling/actk Under the hood, each task in the workflow is managed by a custom Prefect Task object that we wrote that allows for git style management of large (multi-TB) imaging datasets while also managing the workflow run itself. This custom Prefect task object allows for each individual task in the whole workflow to: 1. run independently from the rest of the pipeline 2. manage produced data upload 3. manage produced data checkout 4. manage upstream data dependency download The underlying data storage and management implementation was written prior to Prefect adding
    Result
    and
    Serializer
    classes and so we are already thinking about how to upgrade those and have written quite a bit on potential implementation details so great job Prefect team for getting to it so quickly 🎉
    🤩 2
    🔬 1
    🚀 5
    💯 3
    😲 6
    :party-parrot: 1
    a
    j
    • 3
    • 3
Powered by Linen
Title
j

Jackson Maxfield Brown

09/18/2020, 7:07 PM
Hey all! I wanted to share a really weird (but neat imo) Prefect workflow and task implementation that we tried just to potentially spark some ideas and discussions. Introducing ACTK (Automated Cell Toolkit) a pipeline to process field-of-view (FOV) microscopy images and generate features and render-ready products for the cells in each field. https://github.com/AllenCellModeling/actk Under the hood, each task in the workflow is managed by a custom Prefect Task object that we wrote that allows for git style management of large (multi-TB) imaging datasets while also managing the workflow run itself. This custom Prefect task object allows for each individual task in the whole workflow to: 1. run independently from the rest of the pipeline 2. manage produced data upload 3. manage produced data checkout 4. manage upstream data dependency download The underlying data storage and management implementation was written prior to Prefect adding
Result
and
Serializer
classes and so we are already thinking about how to upgrade those and have written quite a bit on potential implementation details so great job Prefect team for getting to it so quickly 🎉
🤩 2
🔬 1
🚀 5
💯 3
😲 6
:party-parrot: 1
a

An Hoang

09/18/2020, 10:32 PM
This is super cool and interesting! I'm about to start an internship with people at the Carpenter lab at Broad Institute, the team who creates and maintain https://github.com/CellProfiler/CellProfiler. Do you guys already have a collaboration with them? Maybe we can work together!
j

Jackson Maxfield Brown

09/19/2020, 12:51 AM
Oh sweet! The Carpenter lab is awesome! We don't have any active collaborations with them but definitely cross circles and always have a great time. That should be a wonderful internship!
j

Jeremiah

09/20/2020, 2:50 PM
Very cool, thanks for sharing @Jackson Maxfield Brown!
View count: 3