https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • p

    Paul Gierz

    03/21/2022, 3:21 PM
    This may be (or rather, is) an entirely “opinion” question: I often have the situation where I need a modern substitution for a Makefile. Don’t get me wrong, the old-school GNU Make is great at what it does, but the syntax can be, ahem, the polite word is “painful”. I would be curious to gathering some ideas: Prefect has something akin to dependency management, but there are other Python-based tools for this as well: Invoke, for example. Note that these are strictly limited to problems that will never need any sort of massive orchestration, so I would be curious what the community thinks: can Prefect solve the Makefile problem as well, or would that be bending a tool to do a task it is not designed for? And, have a nice start to the week!
    k
    5 replies · 2 participants
  • e

    Emma Rizzi

    03/21/2022, 3:52 PM
    Hi! I'm trying to implement a python library to gather all the tasks in common in my ETL flows and looking for some insights on the best way to do it with prefect. Sharing more in thread :
    k
    a
    8 replies · 3 participants
  • s

    Samay Kapadia

    03/21/2022, 4:07 PM
    What’s up prefects - I have a pretty simple task
    @prefect.task()
    def get_backfill_or_scheduled_date():
        time = prefect.context.get("backfill_time") or prefect.context.get("scheduled_start_time")
        print(time)
        print(type(time))
    However I see that this task creates a pod in kubernetes with the image
    prefecthq/prefect:0.15.12
    , is there a way that I can specify the python version for this image? The mismatch between the build and run environment is causing an error
    k
    5 replies · 2 participants
  • f

    Florian Guily

    03/21/2022, 4:36 PM
    Hey, considering the new "Beta" stage of Orion, do you still recommend implementing 1.0 in production or you think the Beta version of Orion is viable ? We are currently building our data stack and we are thinking about Prefect.
    k
    3 replies · 2 participants
  • m

    Michael Moscater

    03/21/2022, 6:15 PM
    Hello all, in our development environment currently we are using a ECS Agent and have a flow of flows, that is running a large-ish batch of small pipelines concurrently, however we are getting a mix of Throttle Errors (RegisterTaskDefinition, DeregisterTaskDefinition). My agent has the AWS_RETRY_MODE and AWS_MAX_ATTEMPTS env variables as 'adaptive' and '25' respectively, but I'm still getting this error, I'm up for splitting this up into smaller calls, but I'm curious if this is a situation anyone has come across and how it was solved. How did you implement the wait/backoff logic on the flow, and is there a way to iterate with waits to implement less calls to the aws api?
    k
    a
    16 replies · 3 participants
  • f

    FuETL

    03/21/2022, 6:19 PM
    Hey guys, i have 2 parameters that at least one should be provided. Prefect have a built-in function to handle this case or should i create a task and validate inside? Thanks!
    k
    3 replies · 2 participants
  • h

    Hedgar

    03/21/2022, 7:24 PM
    I think a have a challenge. I made a significant change to an existing pipeline on the 18th which was last Friday, on this day after the change when I looked at my prefect cloud dashboard I observed it now has a new version. Fast forward to today, according to my schedule the flow ran today but the csv file imprint Friday’s date I.e
    <tel:18-03-2022|18-03-2022>.csv
    instead of
    <tel:21-03-2022|21-03-2022>.csv
    like I have said before my code is on ec2 instance that through a lambda start and stops at certain time of each day. What could be amiss?
    k
    10 replies · 2 participants
  • m

    Mohan kancherla

    03/21/2022, 7:56 PM
    Hello Everyone, I am very new to prefect and trying to run ECS container through ECS agent setup. I have provided the task definition arn, env variables, cluster name of ECS to the ECSRUN function and register the flow to the project. When I ran the flow, it gave the below error. Can anyone please explain me more about the error and how we can tackle that?
    An error occurred (InvalidParameterException) when calling the RunTask operation: Override for container named flow is not a container in the TaskDefinition.
    k
    a
    80 replies · 3 participants
  • j

    joshua mclellan

    03/21/2022, 7:58 PM
    the machine I was using to host the prefect server failed over the weekend. When i restarted the machine and tried running
    prefect server start --expose --use-volume
    its not using the existing data/configurations i set up and in the logs im seeing the following messages:
    hasura_1    | {"type":"startup","timestamp":"2022-03-21T19:50:54.856+0000","level":"error","detail":{"kind":"catalog_migrate","info":{"path":"$","error":"Cannot use database pr
    eviously used with a newer version of graphql-engine (expected a catalog version <=40, but the current version is 47).","code":"not-supported"}}}
    hasura_1    | {"path":"$","error":"Cannot use database previously used with a newer version of graphql-engine (expected a catalog version <=40, but the current version is 47)."
    ,"code":"not-supported"}
    how do I go about debugging this?
    k
    a
    4 replies · 3 participants
  • a

    Anatoly Myachev

    03/21/2022, 8:33 PM
    Hello everyone! Is the
    EXTRA_PIP_PACKAGES
    environment variable supposed to work with
    KubernetesFlowRunner
    ?
    k
    a
    +1
    14 replies · 4 participants
  • m

    Michael Aldridge

    03/21/2022, 9:08 PM
    I'm in the process of trying to deploy prefect in a test environment and I'm at the point where the instructions in /getting-started tell me to run
    prefect server create-tenant --name default
    . I get that when deploying as a standalone service you need to create the tenant, unfortunately this command appears to be expecting prefect to be visible on localhost, which it is not. Is there some variable I was supposed to export to get the local CLI to see the remote prefect server?
    ✅ 1
    k
    4 replies · 2 participants
  • c

    Chris Reuter

    03/21/2022, 9:35 PM
    Hey all 👋 we're bringing the 🍕 Pizza Patrol to Austin, TX! If you're going to Data Council or just live in the Lone Star State, we'd love to see you there. All are welcome for free pizza and drinks 🍺. More info on Meetup! https://prefect-community.slack.com/archives/C036FRC4KMW/p1647898467352239
    🍕 3
  • d

    Darshan

    03/21/2022, 11:41 PM
    Hello - in prefect 2.0, is there a way to provide the task name dynamically ? For example, if I have a function defined as a task which is being called multiple times from a flow, I want to append a dynamic suffix to the task name.
    m
    a
    7 replies · 3 participants
  • d

    davzucky

    03/22/2022, 12:46 AM
    With Orion, do you think it would be feasible to have a task that returns a pre-configured fsspec filesystem that is set up from Orion Storage? We are using different storage types depending on the environment. I would like to be able to remove a lot of conditional code we have today with prefect 1.0, and it look like Orion with the Storage may be really helpful for that.
    👍 1
    j
    2 replies · 2 participants
  • v

    Vadym Dytyniak

    03/22/2022, 8:53 AM
    Hello. What is the correct way to fail task that have retry logic, but I am sure that even 100 retries will not help?
    a
    9 replies · 2 participants
  • s

    Shrikkanth

    03/22/2022, 10:59 AM
    Hey all , Is it possible to trigger a prefect flow using the flow name currently running as prefect cloud server using AWS Lambda ??? Any suggestions ???
    a
    2 replies · 2 participants
  • a

    andrr

    03/22/2022, 12:50 PM
    Hey all, 👋 We face several problems with flows that run in the Kubernetes cluster. • Pods often stuck in the
    Running
    state with the last message in logs
    DEBUG - prefect.CloudFlowRunner | Checking flow run state...
    • The flow in Prefect Cloud stucks in the
    Cancelling
    state and the pod stucks in the
    Running
    state in the Kubernetes cluster. Context: • prefect version
    0.15.13
    • Private Azure AKS cluster • We've tried to set
    PREFECT__CLOUD__HEARTBEAT_MODE
    to
    "thread"
    , but it only got worse (more stucked pods in the
    Running
    state). Now we have
    PREFECT__CLOUD__HEARTBEAT_MODE
    with
    "process"
    value and
    tini -- prefect execute flow-run
    as PID 1 to handle zombie process. It seems like the problem with the heartbeat process detecting the change to
    Cancelling
    or
    Cancelled
    states of the flow. I appreciate any help, thanks 🙂
    k
    a
    16 replies · 3 participants
  • f

    Florian Guily

    03/22/2022, 1:13 PM
    Hey, i'm currently trying to do a simple ETL flow that connects to an API and fetch some records (i'm quite new to Prefect). I have to provide an API key to connect to this API as a query parameter. Is there some good practice regarding manipulation of this key ? Can i write it to another file that the flow will read ? i'm testing this locally but this will go in production env in the future (i hope)
    s
    3 replies · 2 participants
  • j

    Jason Motley

    03/22/2022, 2:30 PM
    I know that you can set retries on individual task failures. Can you set a retry on a "flow of flows" if one flow fails?
    k
    14 replies · 2 participants
  • p

    Pedro Machado

    03/22/2022, 4:43 PM
    Hi everyone. I'd like to understand how memory is managed in a flow. I have a long-running flow that calls an API to get data. The flow works roughly like this: • get list of URLs to retrieve (about 180k URLs) • break the URLs in groups of 150 (list of lists) • a mapped task receives a list of 150 URLs and calls the API • another mapped task receives the API output for 150 URLs and saves the output to s3 I am using S3 results caching for the data intensive tasks (tasks 2 and 3 above) and Prefect Results for the rest of the tasks. I am seeing that the memory utilization keeps increasing until the container runs out of RAM (this is running on ECS fargate). It seems to be keeping the data retrieved from the API in memory, even after it's saved to s3. I can increase the container RAM but am trying to understand how I could write the flow so that it does not run out of RAM. This is what the Memory Utilization chart looks like. Eventually, the container dies and Prefect Cloud restarts it. Any suggestions?
    k
    5 replies · 2 participants
  • r

    Rajan Subramanian

    03/22/2022, 5:55 PM
    Hello, any possibility of adding a -r flag or --run flag when one does
    prefect deployment create deployment_name
    that runs a deployment if a worker agent is already listening, if not, then its scheduled to run once upon creation of the worker queue.
    a
    20 replies · 2 participants
  • h

    Hedgar

    03/22/2022, 6:32 PM
    What's the best way to set up a cronjob in a local machine with prefect code within a virtual environment. Currently did
    pipenv shell prefect  agent local start
    inside a sh script which I hands over to crontab like this
    07 18 * * 1-5 bash startagent.sh
    this works for sometime but suddenly stop saying it can't find or recognize those commands
    k
    4 replies · 2 participants
  • d

    David Beck

    03/22/2022, 7:37 PM
    Hi again, so I'm running a flow with KubernetesRun config in Prefect Cloud with a SqlServerFetch task that is receiving this error:
    Error during execution of task: OperationalError('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Client unable to establish connection because an error was encountered during handshakes before login. Common causes include client attempting to connect to an unsupported version of SQL Server, server too busy to accept new connections or a resource limitation (memory or maximum allowed connections) on the server. (26) (SQLDriverConnect)')
    The task is defined first outside the scope of the flow and the task is mapped with the queries generated from another task. See the rough pattern below with some values obscured. The strange thing is that this pattern will run on my local machine and successfully pull from our database. Even stranger is that another flow uses SqlServerFetch with KubernetesRun config in Prefect Cloud successfully; though that job is mapped in the different way. Any thoughts on this? Only thing that comes to mind is some issue with the driver selected when mapping the task
    k
    20 replies · 2 participants
  • s

    Sean Talia

    03/22/2022, 7:47 PM
    Hi All, Any AWS + ECS experts here who have figured out how to monitor the CPU + memory utilization of
    ECSRun
    flows that execute on Fargate? I've noticed a couple of times that my ECS task will run out of memory and kill my flow, but when I go to the ECS (or Cloudwatch) console, there's no way for me to actually examine those metrics for the individual ECS task that was kicked off via my
    ECSAgent
    . From my digging around, it seems like it might have something to do with the fact that it's not actually the ECS service that my ECS task is tied to that's "responsible" for launching the ECS task (as is evidenced by the fact that the ECS dashboard says it's currently running 0 tasks even though the flow and the underlying ECS task are clearly running), but I'm not sure. I'm just trying to get better insight into the actual memory/CPU requirements of my flow, without having to, say, briefly move its execution to EC2, monitor it there, and then move it back to ECS Fargate... Many thanks in advance for any tips!
    a
    12 replies · 2 participants
  • d

    Daniel Chapsky

    03/22/2022, 9:32 PM
    Yo all. Ran into an Orion issue and wondering if anyone had something similar. Basically I’m trying to create a flow + DeploymentSpec, then register it to Orion from within the same python file. (Currently have orion server running locally). @Darren was able to forward along this code snippet. But I’ve been getting an error getting it to work:
    from prefect import flow
    from prefect.deployments import DeploymentSpec, create_deployment_from_spec
    
    
    @flow
    def hello_world(name):
        print(f"Hello {name}!")
    
    
    spec = DeploymentSpec(
        flow=hello_world,
        name="inline-deployment",
        parameters={"name": "Marvin"},
        tags=["foo", "bar"],
    )
    
    create_deployment_from_spec(spec=spec)
    🙌 1
    m
    v
    60 replies · 3 participants
  • s

    Shiyu Gan

    03/23/2022, 2:51 AM
    Meta question about discourse, is slack or discord the more active community for discussion?
    k
    a
    2 replies · 3 participants
  • s

    Shiyu Gan

    03/23/2022, 3:24 AM
    When DaskExecutor is used, does Prefect delegate DAG (of tasks) scheduling all to DaskScheduler?
    k
    t
    24 replies · 3 participants
  • a

    Architha Rao

    03/23/2022, 6:41 AM
    Hi. I have prefect server set up. I see the scheduled flows are submitted bu the agent never moves it to run state. Any idea why and what the fix could be?
    a
    1 reply · 2 participants
  • t

    Tomer Cagan

    03/23/2022, 7:31 AM
    Hi, Not sure whether I am doing something wrong or it is a bug but I am trying to run a mapped tasks that returns two values inside my flow an I am getting an error (code and trace inside)
    p
    a
    8 replies · 3 participants
  • p

    Paul Gierz

    03/23/2022, 8:59 AM
    Hi there, this might rather be an issue in form of a feature request, but in the HPC world, the preferred containerisation technology is Singularity rather than Docker. There are python libraries for handling Singularity in a similar way as Docker can be handled: https://github.com/singularityhub/singularity-cli. I could try to mimic the Docker task templates that are provided in prefect with singularity instead, would such a feature be useful for the community?
    a
    1 reply · 2 participants
Powered by Linen
Title
p

Paul Gierz

03/23/2022, 8:59 AM
Hi there, this might rather be an issue in form of a feature request, but in the HPC world, the preferred containerisation technology is Singularity rather than Docker. There are python libraries for handling Singularity in a similar way as Docker can be handled: https://github.com/singularityhub/singularity-cli. I could try to mimic the Docker task templates that are provided in prefect with singularity instead, would such a feature be useful for the community?
a

Anna Geller

03/23/2022, 9:50 AM
It's the first time I see such a request so definitely not a common issue. And given how widespread and popular Docker is, it's basically the default. But if you would like to contribute an integration with Singularity, doing it via something like
SingularityFlowRunner
in Prefect 2.0 would make sense. Flow runners in Orion seem to be the right place for that
View count: 9