https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • m

    Madison Schott

    07/07/2021, 6:09 PM
    Does Prefect offer any solutions if you do not want to run on your own cloud service? Can Prefect provide any cloud infrastructure solution?
    k
    • 2
    • 5
  • w

    wiretrack

    07/07/2021, 8:40 PM
    Hey guys, so continuing to learn from the architecture, I was wondering about scalability outside the cloud version. I’ve been running a few testes and the database grows really quickly since it’s persisting task_run_state, flow_run_state and other data that tends to grow really fast when you have a relatively large amount of flows (+500 / +1000). I wouldn’t think execution would be a problem though, if I use the Kubernetes Agent and flows are running as k8s jobs, I’m guessing this would result in almost infinite scale in the execution sense, but still, the scheduler would have to query the database, and even though it should query a small table (
    flow
    ), I was wondering if the large amount of rows on other tables will start to get in the way of the frontend performance (and hasura’s, and apollo’s) . Putting
    state
    in mongodb or something should completely solve the challenge (not really sure if it’s really a challenge), but it seems that this would be a huge change, since the code is really well tied together. I was wondering how do you guys see scalability on the server, and curious on what approaches the cloud version uses to overcome potencial scalability issues in the long term.
    k
    • 2
    • 4
  • a

    Aric Huang

    07/07/2021, 9:55 PM
    Hi, I'm trying to use KubernetesRun config with Github flow storage and getting a
    Failed to load and execute Flow's environment: FileNotFoundError(2, 'No such file or directory')
    error running the flow, when using the
    job_template_path
    option for KubernetesRun. I can successfully register the flow, and when running the flow it seems to be respecting the
    template.yml
    I passed in (I see my Kubernetes cluster running an appropriate pod based on my template) - but after pulling the image I get the
    FileNotFoundError
    . Thoughts on what is going on? My flow basically looks like this:
    with Flow("Test") as test_flow:
        ...
    
    test_flow.run_config = KubernetesRun(
        job_template_path="template.yml"
    )
    
    test_flow.storage = GitHub(
        repo="<path>",
        path="flows/test_flow.py",
        access_token_secret="GITHUB_ACCESS_TOKEN"
    )
    k
    • 2
    • 17
  • j

    Joseph Loss

    07/07/2021, 10:02 PM
    are you not able to have two agents listening for flows on the same computer, with different labels? I have a docker agent and a local agent, all of a sudden getting this error:
    Flow run 478adfa1-5f4f-4dec-a121-a2443bc0a253 has a `run_config` of type `LocalRun`, only `DockerRun` is supported
    I had registered the flow and previously used the flow on LocalRun, now all of a sudden it's failing but it was working a few hours ago?
    k
    m
    m
    • 4
    • 4
  • m

    Maria

    07/08/2021, 12:05 AM
    Hi, I have a pretty simple setup and flow (works fine locally), but I cannot make it run when I choose storage configuration to be S3 and run from the cloud with ECS agent and executor. I don't do anything unusual - simple flow with a few simple tasks. The error I'm getting is
    Failed to load and execute Flow's environment: FlowStorageError("An error occurred while unpickling the flow: ModuleNotFoundError("No module named 'transform'")")
    If I comment out transform, it will complain about next module. Project setup example is in the thread
    k
    t
    • 3
    • 13
  • m

    Mike Wochner

    07/08/2021, 6:24 AM
    Is it possible to fail a flow, if the agent is not available at the scheduled time? I would prefer this behavior in my case instead of the flow being marked late and start when the agent comes online.
    a
    k
    • 3
    • 3
  • d

    David Elliott

    07/08/2021, 8:48 AM
    Hey all, it was my understanding that if any task in a flow run fails, the flow will end up as failed, however I've just had my flow return with state Success despite having 1 failed task run, and 3 skipped task runs. Is this expected behaviour? Context - it's a very large static flow ~1250nodes. Prefect version 0.14.14, K8s + Dask execution
    a
    • 2
    • 4
  • c

    Christian Michelsen

    07/08/2021, 9:25 AM
    Hi! Yet another logging question.. I am trying to add a file logging handler that captures everything, while the “prefect” logger only shows INFO and up. I have tried searching around in the website, github and here, but with no luck. Right now I have the following MVE (see thread) and I am running prefect locally. Everything I do changes both the level of the prefect logger and the file-logger at the same time. How do I unentangle them? Also, it would be great if it could work with a local dask executor!
    k
    • 2
    • 5
  • m

    Michael Hadorn

    07/08/2021, 2:44 PM
    Hello all Is there any way to add a simple help text (or possibly even HTML) with the parameters? My team would be overjoyed... I'm posting this specifically here because it would be easiest to have an argument directly to prefect.Parameter(). Which is then displayed on the server (and could even used on the command line). Thanks!
    💡 1
    k
    • 2
    • 2
  • a

    Alain Prasquier

    07/08/2021, 3:37 PM
    Hello everyone, I ‘m trying to run my flow using AWS ECS Fargate (following the great step-by-step of Anna Geller: https://towardsdatascience.com/serverless-data-pipelines-made-easy-with-prefect-and-aws-ecs-fargate-7e25bacb450c#3e93). When running the flow, I’m getting an unplickling error
    Failed to load and execute Flow's environment: UnpicklingError("invalid load key, '{'."
    I’ve read the Slack thread that referred this error: https://prefect-community.slack.com/archives/CL09KU1K7/p1623777537484700?thread_ts=1623704787.418000&amp;cid=CL09KU1K7 which seems to point to version compatibility issues. My setup : • Agent Prefect version 0.15.0 • Running on Prefect Cloud (“core_version”: “0.14.22+9.g61192a3ee”) • My task is a simple hello-world log, deployed with
    with Flow("s3_flow", storage=S3_STORAGE, run_config=ECSRun_CONFIG) as flow:
    •
    is_serializable(flow) = True
    Should I be downgrading my agent to match the server version ? Any help will be very welcome !
    k
    • 2
    • 30
  • n

    Nicholas Chammas

    07/08/2021, 5:44 PM
    If I change the executor for an existing flow and reregister it on Prefect Cloud, shouldn’t that trigger a new version to be registered? Or does the executor somehow get updated without a new version? It’s hard to tell what’s going on because the Cloud UI, as far as I can tell, doesn’t show you what executor is configured for a given flow. So to be safe I’ve run
    prefect register --force
    to make sure the new executor config is being sent up to Prefect Cloud. But I don’t know if that’s necessary.
    k
    • 2
    • 1
  • j

    Jeff Baatz

    07/08/2021, 6:17 PM
    I'm running a flow that uses two subflows via
    StartFlowRun
    and the parent flow is throwing
    ValueError: Failed to find current tenant None in result {'data': {'tenant'...
    The subflows are triggering and running correctly, but it looks as though the parent flow can't view their status? Does the agent running the parent flow need to have a user API key or something attached to it in order to query flow status, but not to submit a flow for execution?
    k
    k
    • 3
    • 15
  • m

    Matthias Roels

    07/08/2021, 6:52 PM
    I was playing around with Prefect 0.15.0 and I stumbled upon the
    prefect build
    command, which seems like an interesting command for my use-case. However, I cannot find any documentation on this command. Is there any, and if so, where can I find it? Thanks!
    k
    • 2
    • 16
  • b

    Ben Muller

    07/08/2021, 8:12 PM
    Hey is there a graphql call that can be make to toggle off the schedule for a flow?
    k
    • 2
    • 3
  • c

    Charles Liu

    07/08/2021, 9:42 PM
    If another user is developing in the same flow repository, do they have to start their own agent? We're running the cloud backend.
    k
    • 2
    • 3
  • i

    itay livni

    07/08/2021, 10:40 PM
    Random: Prefect according to copilot 🙂 -- It gets more interesting but I thought this was neat. ....
    🤩 5
    :marvin-duck: 8
    k
    • 2
    • 2
  • t

    Tom Blake

    07/09/2021, 8:11 AM
    Hi there! 👋 I'm using Prefect v0.15.0 and following the docs to set up slack notifications and I'm getting an odd error:
    Exception raised while calling state handlers: ValueError("Failed to find current tenant None in result {'data': {'tenant': [{'slug': 'myTenant', 'id': myTenantId}]}}")
    Any ideas to what could be causing this?
    z
    • 2
    • 7
  • h

    Hugo Shi

    07/09/2021, 11:30 AM
    Hello! If I have a flow that is successfully check-pointing results (with S3Result in my case), what's the best way to have the flow use the checkpoint in successive invocations as a cache?
    k
    • 2
    • 4
  • b

    Bruno Murino

    07/09/2021, 12:45 PM
    Hi everyone — I’m trying to use the compiled “task run name” on my custom state handler, but I’m struggling to find it on the “obj”. Does anyone know if that’s possible?
    z
    • 2
    • 2
  • p

    Paolo

    07/09/2021, 3:27 PM
    Hello folks! I'm just here to browse and carouse, a I'm still learning how prefect works. I'll be mostly a passive user, reading others answers and silenty judging everyone, but I'll get in touch if the need arises. Pack a towel!
    👋 1
    k
    d
    • 3
    • 2
  • n

    Nicholas Chammas

    07/09/2021, 4:22 PM
    If I pass a
    Parameter
    directly to a
    DatabricksRunNow
    task, Prefect detects the dependency and passes the information from the parameter to the task correctly. e.g.
    path = Parameter("path")
    
    DatabricksRunNow(...)(
        databricks_conn_secret=SECRET,
        notebook_params: {
            "path": path,
        }.
    )
    However, if I plug the parameter into a formatted string, for example, the information is no longer passed from the parameter to the task correctly.:
    path = Parameter("path")
    
    DatabricksRunNow(...)(
        databricks_conn_secret=SECRET,
        notebook_params: {
            "path": f"{path}",
        }.
    )
    In this case, the parameter class instance is plugged into the string vs. the actual parameter value that we want. So the notebook is given a
    repr()
    of a
    Parameter
    class instance — which is unusable, of course — instead of the string value of the parameter that we actually want. Why is that, and is there a way around this?
    k
    • 2
    • 5
  • j

    Jacob Goldberg

    07/09/2021, 4:58 PM
    For a system executing flows in AWS ECR and orchestrated by Prefect Cloud, is there a way to redirect Prefect Cloud logs to CloudWatch?
    k
    • 2
    • 3
  • v

    Vincent Chéry

    07/09/2021, 6:45 PM
    Hi all ! I'm having a bit of trouble with documenting my project with sphinx or pdoc. For sphinx : I found this archived conversion which addresses the question of how to make a task's docstring available to sphinx, but I'm having an issue elsewhere : if I manually add a
    ..autofunction:: my_beautiful_task
    directive, sphinx will document it according to its docstring without any issue, but if I rely on
    ..automodule::
    to automatically document all the members (functions, classes...) of my module, it only finds regular functions and classes, not prefect tasks and flows. pdoc : I just gave it a quick shot, the same happends, it documents functions and classes, but not prefect tasks and flows :( Any idea? Thx !
    k
    • 2
    • 3
  • j

    Jan Vlčinský

    07/09/2021, 10:01 PM
    We want to use AWS ECS to run flows using AWS ElasticFileSystem (EFS) to store the data created during data processing. We are able to mount the EFS from EC2 instance, but we are failing terribly when trying to do so when using runner
    prefect.run_configs.ECSRun
    . Our flow keeps starting and never completes - not even printing single line of text. Is there any working example of AWS ECS based prefect runner storing data on mounted AWS EFS volume?
    k
    • 2
    • 10
  • s

    Son Mai

    07/10/2021, 3:55 AM
    I start local agent by open Power shell and "prefect start agent" (windows server 2012). But i want to run agent like services and auto start if server restart. P/s: My server cant install docker because docker not support Win Server 2012.
    s
    • 2
    • 1
  • s

    Sayandip Sarkar

    07/10/2021, 4:32 PM
    Hi everyone! I wanted to understand if there is a way to execute two tasks in the same executor in order to reduce data transfer. Is this something that is taken into consideration internally while building the dag or is their some way to let the executor know that this is a requirement. Any help would be appreciated. Thanks in advance!
    k
    • 2
    • 1
  • d

    dex

    07/10/2021, 6:36 PM
    hello everyone, I'm new to prefect, just started hacking today. I'm trying to use it to create dependency for my databricks jobs. But I can't seem to figure out how to create dependency between jobs. and I keep getting this error. (is imperative API the only way?). I'm running it with an agent in a VM if that's relevant
    e
    k
    • 3
    • 4
  • d

    dex

    07/11/2021, 8:26 AM
    i have the following setup, where
    flows.py
    has my flow definition, and
    utils.py
    hosts some number of helper functions. And I'm using Github storage for the flow. Since Github storage only specify the path of the
    flows.py
    , it got module not found during execution. I wonder if Github storage does not support module? I can't seem to find a good descrption in the documentation. Thanks in advance if anyone can give me a pointer.
    e
    k
    m
    • 4
    • 6
  • m

    Mexson Fernandes

    07/11/2021, 2:57 PM
    Hello everyone. I am Helm based deployment of Prefect. Setup is done with k8s agent for job propagation. I am confused for about how to go and load the files inside the platform. I see it supports Gitlab storage. But is there a way to sync the code to Prefect with CI?
    k
    • 2
    • 10
  • s

    Scott Vermillion

    07/11/2021, 10:15 PM
    I have a fairly basic question… Let’s assume I have some Python app that uploads something to S3. Now I want to kick off my flow. So I add this to my aforementioned Python app:
    from prefect import Client
    
    client = Client()
    client.create_flow_run(flow_id="<some flow_id>")
    But lo and behold, someone comes along an re-registers the flow with Cloud. The previous flow_id gets archived and a new one is generated. Now I have to go and update my Python app with the new flow_id? Can this be done by name or something? Or can I pull the flow_id as a variable? Or? Thank you.
    k
    • 2
    • 5
Powered by Linen
Title
s

Scott Vermillion

07/11/2021, 10:15 PM
I have a fairly basic question… Let’s assume I have some Python app that uploads something to S3. Now I want to kick off my flow. So I add this to my aforementioned Python app:
from prefect import Client

client = Client()
client.create_flow_run(flow_id="<some flow_id>")
But lo and behold, someone comes along an re-registers the flow with Cloud. The previous flow_id gets archived and a new one is generated. Now I have to go and update my Python app with the new flow_id? Can this be done by name or something? Or can I pull the flow_id as a variable? Or? Thank you.
k

Kevin Kho

07/11/2021, 10:21 PM
Hi @Scott Vermillion, maybe you can use the StartFlowRun task which takes in project name and flow name. It uses the Client() under the hood so authentication should be the same. You can just use
StartFlowRun(…).run()
to run it.
s

Scott Vermillion

07/11/2021, 10:24 PM
Wow, thanks so much Kevin! I was not expecting a Sunday response! And that looks like exactly what I’m looking for. Happy weekend!!
Hi Kevin (et al). You mention authentication in your response. I spent the remainder of my Sunday approaching that from different angles based on what I was seeing the docs. Is it safe to say that if I want to distribute the aforementioned python app (it’s a GUI to a bunch of Python scripts, really), then anyone installing it must also install the Prefect CLI and do ‘prefect auth login’? That was all I was able to get working, in any event.
k

Kevin Kho

07/12/2021, 4:54 PM
Yes that's right. Using the Client requires authentication so either the StartFlowRun or the client.create_flow_run will need it because they are creating new flows. If your UI is tied to new flow creation, then yes to the
prefect auth login
.
s

Scott Vermillion

07/12/2021, 5:27 PM
OK, that all seems reasonable, Kevin. I was kind of hoping there might be a way to embed auth into the UI in order to avoid the need of the separate auth, but I honestly can’t think of a way that could be done securely, so this all makes sense. Thanks very much once again!!
👍 1
View count: 2