https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • c

    Chris White

    07/10/2019, 4:16 PM
    This is a pretty cool idea, so I am very open to modifying the code to support this - would you mind opening an issue for it actually?
    šŸ‘ 1
    d
    1 reply Ā· 2 participants
  • b

    Brett Naul

    07/10/2019, 5:48 PM
    hi all, quick dask q: it looks like the maximum allowed version is a bit behind latest, is there a reason for that or is it just an oversight? cf. https://github.com/PrefectHQ/prefect/pull/1181
    c
    9 replies Ā· 2 participants
  • a

    Adam Roderick

    07/11/2019, 2:57 AM
    Hi everyone. Discovered Prefect. Looking forward to it
    :marvin: 4
  • c

    Chris White

    07/11/2019, 3:08 AM
    Welcome @Adam Roderick! Let us know if any questions come up as you start kicking the tires!
  • a

    Adam Roderick

    07/11/2019, 12:59 PM
    Thanks! It's all looking good. I have some questions about top-level orchestration. It looks like Cloud is what I am really after, where I can attach a schedule to a flow and Cloud handles kicking it off at the right time(s). If I am self-hosting using Core, would you suggest I kick off a process that will be always on and will just wait until the scheduled time?
    c
    5 replies Ā· 2 participants
  • a

    Adam Roderick

    07/11/2019, 12:59 PM
    Some other approach?
  • a

    Adam Roderick

    07/11/2019, 1:01 PM
    Other question, I want to accommodate ad hoc, parameterized flow runs. In the current system, these are triggered by some agent putting a message in an SQS queue. Would you suggest a top-level flow that monitors the queue on a schedule and runs flows or tasks as messages are received? Or some non-prefect listener that will kick off the flow runs as messages are received? I'd like to have the runs be individual for later visualization and troubleshooting
    c
    2 replies Ā· 2 participants
  • d

    David Ojeda

    07/11/2019, 4:19 PM
    Ah sorry I had a question but I haven’t finished typing it… Here it is:
  • d

    David Ojeda

    07/11/2019, 4:27 PM
    I have some questions regarding the caching mechanism on core when using the dask executor: 1. where are the inputs/parameters/results actually cached? In the dask scheduler, the dask worker or the python code that launches the flow, or elsewhere? 2. related to my first question, what is the lifetime of the cache (other than the duration set by
    cache_for
    task parameter) 3. when are the i/p/r cached? I came up with these questions because I am trying to leverage the cache for my case and I cannot manage to see the cache hit. I have some map tasks that are sequential to each other, with about 500 elements per map. After I wait until there are about ~200 finished tasks on the first map, I am purposedly killing my workers to see them re-do all of the work again and profit from the cache, but they are still doing each mapped task one by one. I ended up understanding what my problem was as I finished writing this ( šŸ¦† ) : we were rolling our own kind of ā€œcacheā€ with our data backend and we raised a state that derives Skipped, but the result caching does not occur when the resulting state is skipped. Even if I solved my problem, I am still posting these questions to be sure I understand the mechanism.
    c
    m
    +1
    16 replies Ā· 4 participants
  • a

    Adam Roderick

    07/11/2019, 8:15 PM
    Design question. I'm in a multi-tenant situation, so I could use a single flow and maps. Or I could use a parameterized flow and run it once per tenant. What are some of the tradeoffs of either approach?
    c
    m
    7 replies Ā· 3 participants
  • k

    Kalani Murakami

    07/12/2019, 6:22 AM
    Whats a good example of when someone might be using "retries" in their system? This sounds really dumb, I know I need it, but need some inspiration
    j
    1 reply Ā· 2 participants
  • y

    Yoni Davidson

    07/12/2019, 11:06 PM
    Hi, where can I find a list of companies using prefect core ?
    c
    1 reply Ā· 2 participants
  • m

    Michael Reeves

    07/12/2019, 11:46 PM
    I have a quick question, something I really like about airflow is the webserver UI that comes with it. Is this something that will be in the prefect open source version in the near future?
    j
    4 replies Ā· 2 participants
  • y

    Yoni Davidson

    07/13/2019, 12:29 AM
    Is it possible to use the cloud version anytime soon? We would like to try it for evaluation
    c
    1 reply Ā· 2 participants
  • j

    jeetu

    07/14/2019, 11:12 AM
    Hi. Just read abt slack few days back and tried running few examples. It makes things intuitive šŸ™‚
    šŸ˜„ 1
  • j

    jeetu

    07/14/2019, 11:17 AM
    I have set up flask-redis-celery stack wherein a user can request a long running background task from flask frontend. The message is conveyed to celery worker which starts the chain of tasks in the background. I can get the status of tasks in flask frontend using asyncresult of celery. I want to replace celery with prefect. Any pointers for doing the same?
    c
    3 replies Ā· 2 participants
  • j

    John Ramirez

    07/15/2019, 2:12 PM
    Hey everyone! I heard about Prefect on the DataEngineering podcast and decided to check it out. I'm very impressed!
    šŸ‘‹ 3
  • j

    John Ramirez

    07/15/2019, 5:53 PM
    Hey does anyone have any recommendations for putting the prefect core engine into a production like setting?
    a
    2 replies Ā· 2 participants
  • s

    Sherman

    07/15/2019, 11:15 PM
    Basic question: I am trying to get Parameters to work, and it seem that no matter what I do, I get an error saying "Flow.run received the following unexpected parameters:". Here's the code:
    -.txt
    c
    m
    6 replies Ā· 3 participants
  • j

    Jeff Quinn

    07/17/2019, 2:35 PM
    Hello, does prefect have a UI similar to airflow? Was poking around to look for a video demo or something but could not find one
    c
    1 reply Ā· 2 participants
  • j

    Jason Damiani

    07/17/2019, 3:52 PM
    Hey all, I'm attempting to use a state handler to update a database for Flow state transitions to Running, Failed, and Successful. I need to use a Flow parameter result to make the update, which I can retrieve from the state result for Failed and Successful, but its not clear to me how to do this for the Running state, which has NoResultType
    c
    j
    +1
    11 replies Ā· 4 participants
  • a

    Alex Cano

    07/17/2019, 10:11 PM
    Hi all, not necessarily a question, but hopefully some guidance. I’ve been using Airflow for about 6 months now, and I feel like I’ve ā€œbentā€ to the framework a ton. For example, manually saving data to intermediate storage, then passing the location through to the next step of the pipeline since you can’t pass data directly through. I was hoping if anyone had some best practices/tips on what to try to control in Prefect vs letting Prefect control for you. Thanks!
    :watching: 1
    j
    c
    11 replies Ā· 3 participants
  • h

    Hendrik Pauthner

    07/18/2019, 2:35 PM
    Hey everyone! I just started having a look into Prefect and so far I am very impressed. I am just a little bit confused right now about the scheduling part and what is possible just with Prefect Core, Frankly, I am quite new to workflow tools as a whole so I don't fully understand what the differences in scheduling functionality between the Core and Cloud versions are (and whether scheduling even can be performed in a useful manner with Prefect Core). Can anybody explain and maybe provide a use case for which you would need the functionality of the Cloud version?
    j
    2 replies Ā· 2 participants
  • j

    Joe Schmid

    07/18/2019, 5:57 PM
    Hi @Chris White and team (and Prefect community), at the risk of opening a can of worms, I wanted to get any thoughts or recommendations you might have on developing Prefect flows in Jupyter notebooks. Our first Prefect use cases are focused on automating some of our data science workflows. We can certainly define task functions and flows in notebooks (following Prefect example code) but it feels like there might be an opportunity to do something more tailored to data science workflows. (Specifically, I'm thinking about whether it's possible to preserve some of the interactivity of notebooks while still defining tasks and flows, e.g. be able to run a single task independent of a flow, knowing that we may need to pass it appropriate parameters, etc.) It's definitely not required for us to do anything fancy, but since we're at the start of this journey I thought I'd at least ask before we start down any particular path.
    c
    3 replies Ā· 2 participants
  • j

    Joe Schmid

    07/18/2019, 5:59 PM
    (And if this risks opening up a "Joel Grus I don't like notebooks" war we can always take this offline!)
    šŸ˜‚ 1
  • d

    David Ojeda

    07/18/2019, 6:18 PM
    Hi there, good job on your 0.6.0 version… It solved many of my cache problems that I had to work around. One comment though, now, if I have a task that is totally unrelated with a cache mechanism, the logs now have some warning pollution, for example:
    2019-07-18 20:12:28 ixion.local prefect.TaskRunner[22036] WARNING Task 'SlackTask': can't use cache because it is now invalid
    … in my opinion, the message is misleading (there is no invalid cache, I have not setup any cache_for for this task) and maybe the warning level is too high for this case. While I was reading the related code:
    if self.task.cache_for is not None:
                candidate_states = prefect.context.caches.get(
                    self.task.cache_key or self.task.name, []
                )
                sanitized_inputs = {key: res.value for key, res in inputs.items()}
                for candidate in candidate_states:
                    if self.task.cache_validator(
                        candidate, sanitized_inputs, prefect.context.get("parameters")
                    ):
                        candidate._result = candidate._result.to_result()
                        return candidate
    
            self.logger.warning(
                "Task '{name}': can't use cache because it "
                "is now invalid".format(
                    name=prefect.context.get("task_full_name", self.task.name)
                )
            )
    I wondered if the problem is just as simple as: the logging instruction was supposed to be indented one level deeper, so it happens when there 1) the cache is enabled and 2) there are cache candidates but all fail to hit.
    c
    2 replies Ā· 2 participants
  • r

    Romain

    07/19/2019, 8:17 AM
    Hi everyone, I am trying to use a conditional flow with a mapped results.Here is an example of what I would like to do :
    conditions = is_true.map(input_data)
    as = do_a.map(input_data)
    bs = do_b.map(input_data)
    ifelse(conditions, as, bs)
    It does not work out because the ifelse function expect the condition to be a boolean, not a list of boolean. Is there a way to do it differently?
    c
    2 replies Ā· 2 participants
  • j

    Jie Lou

    07/19/2019, 8:08 PM
    Hi everyone! I just started using Prefect and find it amazing. A quick question is: it seems like multiple assignment is not supported. For example:
  • j

    Jie Lou

    07/19/2019, 8:09 PM
    Untitled.txt
  • j

    Jie Lou

    07/19/2019, 8:09 PM
    Correct me if I am wrong. I wonder why this is not supported and maybe it's in the future development?
    j
    c
    13 replies Ā· 3 participants
Powered by Linen
Title
j

Jie Lou

07/19/2019, 8:09 PM
Correct me if I am wrong. I wonder why this is not supported and maybe it's in the future development?
j

Jeremiah

07/19/2019, 8:11 PM
Hi @Jie Lou — you’re correct, the multiple assignment you’re attempting is not supported. That’s because when you’re building your flow, Prefect has no way of knowing what’s going to be returned from the task — it’s building the computational graph, but not executing it, so we don’t know (yet) that there are two items to assign.
However, I would suggest a slightly different variant of your solution. As proposed, you are going to run your task two times and then grab the first and second result of those two runs, respectively.
Instead, try:
task_result = task_function(...)
result1 = task_result[0]
result2 = task_result[1]
This represents a single execution of the task, and then two indexes (which are secretly tasks themselves) of that single result
j

Jie Lou

07/19/2019, 8:12 PM
ahh, that makes sense.
j

Jeremiah

07/19/2019, 8:12 PM
does that make sense?
:yes:
j

Jie Lou

07/19/2019, 8:12 PM
thanks for the suggestion! you are right
j

Jeremiah

07/19/2019, 8:12 PM
I’m glad you’re having a good experience otherwise and definitely keep asking with any questions you have!
j

Jie Lou

07/19/2019, 8:13 PM
šŸ˜€
A new question: what if a function returns (let's say) 20 objects and then i need to write 20 line codes to declare them...is there any efficient way to solve? thx
c

Chris White

07/19/2019, 10:17 PM
Hi @Jie Lou sorry just seeing this question --> could you explain your use case for this a little more? If you need to process each item individually, maybe take a look at Task Mapping: https://docs.prefect.io/guide/core_concepts/mapping.html
j

Jie Lou

07/22/2019, 1:34 PM
Thank you, Chris. I get what you meantšŸ™‚
View count: 1