https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • a

    alexandre kempf

    01/07/2020, 1:39 PM
    Hello guys ! I have a weird bug when I'm using prefect. I reduce the example to the minimal I could find. This works:
    from prefect import task, Parameter, Flow
    
    @task
    def load_data(c, b):
        return c
    
    @task
    def bugtask(c):
        return c
    
    with Flow("training") as flowModel:
        init = Parameter("c")
        data = load_data(init, b= {"f": 4})
        # data = load_data(init, b= {"f": bugtask})
    
    state_model = flowModel.run(c=5)
    Now if you just use the comment instead of
    load_data
    (basically, if you have a task in your arguments, even nested in other structured and not executed, there is an error. It this expected ? I have the feeling that it tries to run all the tasks, even if they are not called ! @josh @Dylan, This is a problem for subflows since I must give the configuration of the subflow as an argument of my run_flow task :s
    j
    • 2
    • 19
  • k

    Kamil Okáč

    01/07/2020, 2:01 PM
    I have a trouble working with tasks declared in external files (using dask executor). Is this how it's supposed to work? main.py:
    from prefect import Flow
    from prefect.engine.executors.dask import DaskExecutor
    import mytask
    
    mt = mytask.MyTask()
    with Flow("Flow") as flow:
        t1 = mt(1)
    
    executor = DaskExecutor(address='tcp://....:8786')
    flow.run(executor=executor)
    mytask,py:
    from prefect import Task
    
    class MyTask(Task):
        def run(self, x):
            return x
    This leads to error on worker: "`ModuleNotFoundError: No module named 'mytask'`" If I use @task decorator instead of subclassing, there's no problem.
    e
    j
    • 3
    • 5
  • j

    John Ramirez

    01/07/2020, 4:09 PM
    When you using
    task.copy()
    the upstream dependencies are removed but are task results copied as well?
    c
    • 2
    • 1
  • m

    matt_innerspace.io

    01/07/2020, 5:17 PM
    Stumbled upon prefect while debugging a few jenkins issues, and it appears to be a great tool that could replace a lot of what i'm doing in jenkins (and was hoping to migrate to airflow). The only drawback I can see, having poked around for an hour, is the lack of UI in the opensource version. While I'd love to use the cloud, the size of my workloads and sensitivity of data makes running in prefectcloud a problem. Just wanted to give some feedback, as i've seen others ask the same question.
    c
    j
    • 3
    • 13
  • j

    Jake Schmidt

    01/07/2020, 7:58 PM
    What’s the Prefectest way to use the value of a
    prefect.Parameter
    in a call to a
    prefect.ShellTask
    ? Would I use a
    StringFormatter
    task?
    z
    j
    c
    • 4
    • 4
  • m

    matt_innerspace.io

    01/07/2020, 9:05 PM
    One simple thing i'm not grasping here - how do i deploy a flow that exists in a
    hello_world.py
    script? Lots of examples of what can happen inside the python script, but i'm missing how it would be deployed as a flow. I can see the flow runs inside a docker container, on an agent. I can see how the
    prefect
    cli tool can run a flow, but it seems the deployment and running are linked somehow? For comparison, i use openfaas python functions which also run in docker containers and are deployed wherever has room to run it, but there is an explicit
    faas-cli push ...
    to deploy to the cloud/cluster when you're ready.
    z
    j
    +2
    • 5
    • 11
  • s

    Sean

    01/07/2020, 11:17 PM
    I have a postgres database that can only be accessed by establishing an ssh tunnel. If possible I would like to establish the tunnel as a task, then use the existing postgres tasks to query the database via the tunnel. However, I believe that a
    CreateSshTunnel
    task does not work in general, as it is stateful and local - i.e. I don't think it would work when using the Dask executor (for example), as the tunnel task could execute on a different node than the postgres task. So what would a correct way to structure this be?
    a
    j
    • 3
    • 6
  • j

    Jeff Brainerd

    01/09/2020, 2:22 AM
    Hi all — I’m totally new to Prefect and Dask, but ran into an issue today that burned a few hours and wanted to make others aware. I inadvertantly upgraded the dependency
    msgpack
    to
    v1.0.0rc1
    which broke Prefect on Dask. The symptom is errors like this when a flow finishes:
    distributed.protocol.core - CRITICAL - Failed to deserialize
    Traceback (most recent call last):
      File "/Users/jeff/.local/share/virtualenvs/jellyfish-u52nBq9x/lib/python3.7/site-packages/distributed/protocol/core.py", line 106, in loads
        header = msgpack.loads(header, use_list=False, **msgpack_opts)
      File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
    ValueError: tuple is not allowed for map key
    Reverting to
    msgpack 0.6.x
    solved the issue. Easy to repro on
    prefect 0.8.1
    and
    dask 2.8.1
    and
    dask 2.9.1
    . Not sure if this is something wonky in my environment or something real…
    :upvote: 3
    c
    • 2
    • 3
  • m

    matt_innerspace.io

    01/09/2020, 3:16 PM
    i know it's available through graphql, but it would be handy to allow python to remove flows as well, so python can fully manage flows.
    :upvote: 1
    j
    • 2
    • 1
  • m

    matt_innerspace.io

    01/09/2020, 4:45 PM
    Similar flows - i have dozens of flows that execute the same code on a schedule (every X seconds, forever), but with different parameters (i.e. different customers, deployments, etc). As far as i know, I need to register the same flow for each set of parameters with a unique name, which then requires a unique docker container for each. Wondering if there is a better way to do this?
    a
    • 2
    • 1
  • y

    yuvipanda

    01/09/2020, 6:28 PM
    hello! prefect seems really cool and a great fit for our use case, particularly around running cron-like schedules.
    :marvin: 1
  • y

    yuvipanda

    01/09/2020, 6:28 PM
    However, my understanding is that to have schedules (https://docs.prefect.io/core/concepts/schedules.html) work, the python process that calls that code needs to continue to run
  • y

    yuvipanda

    01/09/2020, 6:28 PM
    or we need to use prefect cloud
  • y

    yuvipanda

    01/09/2020, 6:28 PM
    is that accurate?
  • y

    yuvipanda

    01/09/2020, 6:29 PM
    a better way to phrase the question might be - is there a way to do cron-like flow executions on a kubernetes cluster with just the open source version? My understanding is 'no' (which is fine!), but just wanted to check
    c
    d
    • 3
    • 12
  • a

    Alexander Kolev

    01/09/2020, 9:44 PM
    Hey guys, just found about Prefect and reading through the docs, it seems like a good match for our use case (main pain point when using airflow is scheduling latencies). I have 2 questions though: 1. Is there anyone already using this in production or is it still in active development and the only stable bits being hosted by the creating company (the Prefect Cloud)? 2. Can anyone confirm they've used Prefect for streaming-like data processing, where DAGs need to be executed on demand and more frequently (vs long running, cron-scheduled runs)
    👋 2
    👀 1
    t
    b
    j
    • 4
    • 5
  • a

    Arsenii

    01/10/2020, 2:09 AM
    Hi all, I wonder if setting up a schedule for the flow is possible via a Parameter. I have a piece of code that I would like to run with different helper files on different schedules -- and I don't know if doing something like
    flow.schedule = daily_schedule if daily else monthly_schedule
    is acceptable inside the Flow itself, or is it better to do something outside it instead. Thanks!
    :upvote: 1
    c
    • 2
    • 2
  • d

    David Mellor

    01/10/2020, 4:02 PM
    are there any conversion utilities available ... say to convert from controlM to Prefect
    🤔 1
    c
    • 2
    • 1
  • a

    alexandre kempf

    01/13/2020, 3:53 PM
    Hello guys 🙂 Once again I need your (wonderful) customer service, but this time more for an advice. I would like to have fully reproducible flows once they are saved on my disk. Right now I have the feeling that if a function A in my flow has a behavior A and then I save the flow. If I change function A to have a behavior B, that I reload the flow saved and that I run it I will have the behavior B. Question 1: Is there an easy way to replay exactly the flow as it was when I saved it ? (with behavior A even after I changed the function A) Question 2: If no, do you recommend any tool do you it ? I feel like forcing the user to commit and saving the commit id is a bit too much for the user I'm targeting. Thanks in advance 🙂
    j
    c
    • 3
    • 25
  • j

    josh

    01/13/2020, 10:10 PM
    Hey everyone, we’re looking to release Prefect
    0.9.0
    sometime tomorrow or the day after and I’m giving an advanced heads up because there are some breaking changes based around the default functionality of Result Handlers. If you have been registering flows with
    Docker
    storage then you should not see any difference in behavior 🙂 There are also some new features and a bunch of enhancements! The full changelog is available here https://github.com/PrefectHQ/prefect/blob/master/CHANGELOG.md#unreleased
    🍾 1
    :upvote: 3
    👍 3
    :marvin: 3
    👏 2
  • b

    Bryan Whiting

    01/14/2020, 9:12 PM
    When building a pipeline with multiple, but different steps (ETL, modeling, scoring) is it recommended to have one long flow or multiple smaller flows? If the latter, is it common practice to import flows from other files to a “main.py” where I’d run all my flows at once? Some flows are long, some are short. Curious how you think about this. Maybe I’m not coding it well enough, because if I understood prefect well enough I’d be able to leverage all the features that make it easy to run my Flows from specific starting points.
    j
    j
    • 3
    • 4
  • s

    Scott Brownlie

    01/14/2020, 9:22 PM
    Hello folks, I've just started researching Prefect as a possible alternative to Luigi and Airflow. One thing I like about Luigi is the ability to save the output of a task to disk, to be used by another task downstream. It appears that Prefect's
    LocalResultsHandler
    does a similar thing, however I can't seem to get it to work as I expected. I have implemented the following toy example:
    from prefect.engine.result_handlers import LocalResultHandler
    from prefect import task, Task, Flow
    
    results_dir = 'prefect_results/'
    
    @task(checkpoint=True)
    def set_input():
        return 10
    
    @task(checkpoint=True)
    def square(x):
        return x**2
    
    with Flow("test", result_handler=LocalResultHandler(dir=results_dir)) as flow:
        task1 = set_input()
        task2 = square(task1)
    
    flow.run()
    I would expect the output from each task to be saved to the specified directory automatically but it's not. Is that not what is supposed to happen?
    c
    • 2
    • 6
  • b

    Bryan Whiting

    01/15/2020, 6:30 PM
    What’s prefect’s replacement for airflow’s backfill? With checkpointing enabled, am I able to run a job from a specific task and all downstream dependencies will execute?
    z
    c
    • 3
    • 4
  • z

    Zach

    01/15/2020, 7:50 PM
    I am having trouble understanding exactly at what level Prefect is able to parallelize tasks? I was looking at the Map/Reduce part of the docs for prefect, and saw that you could map a task over a number of inputs in parallel, but what if each of those tasks take up a lot of processing power, then things will be very slow. I want to use prefect as my workflow manager, but I am confused how things get parallelized. If I run prefect hooked up to my kubernetes cluster, do all the workflow steps happen inside of a single container in a single Pod? If I want to make tasks not just only have access to the resources of the Pod that the Prefect flow is running in, do I have to make each task kick off its own kubernetes job? Sorry for all the questions, it just isn't clear to me how to run a workflow on 50 inputs at once, or if that even makes sense (let me know if I should really just be running 50 separate workflows).
    d
    j
    • 3
    • 4
  • a

    alexsis

    01/16/2020, 11:58 AM
    Hey here! We are getting some experiments with Prefect by trying use it to manage workflows within microservices. We use Nameko framework to work with microservices. Nameko uses Eventlet to manage tasks within. For this case I thought we can use Prefect as local executor. The problem we faced with is Prefect try to fork process without any success:
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 238, in get_context
        return super().get_context(method)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 193, in get_context
        ctx._check_available()
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 306, in _check_available
        raise ValueError('forkserver start method not available')
    Has anybody had an experience with using Prefect in eventlet based frameworks?
    d
    c
    • 3
    • 2
  • j

    Jackson Maxfield Brown

    01/17/2020, 7:26 PM
    Question: when I run
    flow.run(executor=DaskExecutor(address=dask_scheduler_address))
    does it serialize the flow / computation graph and send it as a task to hold onto? Basically, if I have a dask scheduler spun up and I am working on my local machine, can I call
    flow.run(...)
    and then shut down my machine?
    z
    • 2
    • 4
  • s

    Sebastian

    01/18/2020, 4:45 AM
    Is there any examples available of running a Flow concurrently? The examples folder has nothing, and it is mentioned that flows may run with concurrency, so I think it would be nice to show an example of this (maybe, for instance, when we receive new data in a stream?)
  • r

    Ryan Connolly

    01/18/2020, 8:38 PM
    is there an example of using the imperative api and adding a switch statement to the flow?
    👍 1
    • 1
    • 1
  • r

    Ryan Connolly

    01/19/2020, 4:17 PM
    we have a use case where we might have to do pretty large backfills over time when that happens... we need to be able to define dependencies that go across cycles there is an open-source project called cylc that defines this problem space well: https://cylc.github.io/ using the imperative api... this type of "cycling" can be accomplished it likely could be accomplished using the
    LOOP
    mechanism from the functional api... but I was having a harder time grokking that concept. I'll attach an example of the prefect-core code if anyone is interested.
    e
    • 2
    • 2
  • r

    Ryan Connolly

    01/19/2020, 4:18 PM
    Untitled.py
Powered by Linen
Title
r

Ryan Connolly

01/19/2020, 4:18 PM
Untitled.py
View count: 1