prefect-community #prefect-community

hi all, quick dask q: it looks like the maximum allowed version is a bit behind latest, is there a reason for that or is it just an oversight? cf. https://github.com/PrefectHQ/prefect/pull/1181

Adam Roderick

07/11/2019, 2:57 AM

Hi everyone. Discovered Prefect. Looking forward to it

marvin 4

Chris White

07/11/2019, 3:08 AM

Welcome @Adam Roderick! Let us know if any questions come up as you start kicking the tires!

Adam Roderick

07/11/2019, 12:59 PM

Thanks! It's all looking good. I have some questions about top-level orchestration. It looks like Cloud is what I am really after, where I can attach a schedule to a flow and Cloud handles kicking it off at the right time(s). If I am self-hosting using Core, would you suggest I kick off a process that will be always on and will just wait until the scheduled time?

Adam Roderick

07/11/2019, 12:59 PM

Some other approach?

Adam Roderick

07/11/2019, 1:01 PM

Other question, I want to accommodate ad hoc, parameterized flow runs. In the current system, these are triggered by some agent putting a message in an SQS queue. Would you suggest a top-level flow that monitors the queue on a schedule and runs flows or tasks as messages are received? Or some non-prefect listener that will kick off the flow runs as messages are received? I'd like to have the runs be individual for later visualization and troubleshooting

David Ojeda

07/11/2019, 4:19 PM

~~Ah sorry I had a question but I haven’t finished typing it…~~ Here it is:

David Ojeda

07/11/2019, 4:27 PM

I have some questions regarding the caching mechanism on core when using the dask executor: 1. where are the inputs/parameters/results actually cached? In the dask scheduler, the dask worker or the python code that launches the flow, or elsewhere? 2. related to my first question, what is the lifetime of the cache (other than the duration set by

cache_for

task parameter) 3. when are the i/p/r cached? I came up with these questions because I am trying to leverage the cache for my case and I cannot manage to see the cache hit. I have some map tasks that are sequential to each other, with about 500 elements per map. After I wait until there are about ~200 finished tasks on the first map, I am purposedly killing my workers to see them re-do all of the work again and profit from the cache, but they are still doing each mapped task one by one. I ended up understanding what my problem was as I finished writing this ( 🦆 ) : we were rolling our own kind of “cache” with our data backend and we raised a state that derives Skipped, but the result caching does not occur when the resulting state is skipped. Even if I solved my problem, I am still posting these questions to be sure I understand the mechanism.

Adam Roderick

07/11/2019, 8:15 PM

Design question. I'm in a multi-tenant situation, so I could use a single flow and maps. Or I could use a parameterized flow and run it once per tenant. What are some of the tradeoffs of either approach?

Kalani Murakami

07/12/2019, 6:22 AM

Whats a good example of when someone might be using "retries" in their system? This sounds really dumb, I know I need it, but need some inspiration

Yoni Davidson

07/12/2019, 11:06 PM

Hi, where can I find a list of companies using prefect core ?

Michael Reeves

07/12/2019, 11:46 PM

I have a quick question, something I really like about airflow is the webserver UI that comes with it. Is this something that will be in the prefect open source version in the near future?

Yoni Davidson

07/13/2019, 12:29 AM

Is it possible to use the cloud version anytime soon? We would like to try it for evaluation

jeetu

07/14/2019, 11:12 AM

Hi. Just read abt slack few days back and tried running few examples. It makes things intuitive 🙂

😄 1

jeetu

07/14/2019, 11:17 AM

I have set up flask-redis-celery stack wherein a user can request a long running background task from flask frontend. The message is conveyed to celery worker which starts the chain of tasks in the background. I can get the status of tasks in flask frontend using asyncresult of celery. I want to replace celery with prefect. Any pointers for doing the same?

John Ramirez

07/15/2019, 2:12 PM

Hey everyone! I heard about Prefect on the DataEngineering podcast and decided to check it out. I'm very impressed!

👋 3

John Ramirez

07/15/2019, 5:53 PM

Hey does anyone have any recommendations for putting the prefect core engine into a production like setting?

Sherman

07/15/2019, 11:15 PM

Basic question: I am trying to get Parameters to work, and it seem that no matter what I do, I get an error saying "Flow.run received the following unexpected parameters:". Here's the code:

-.txt

Jeff Quinn

07/17/2019, 2:35 PM

Hello, does prefect have a UI similar to airflow? Was poking around to look for a video demo or something but could not find one

Jason Damiani

07/17/2019, 3:52 PM

Hey all, I'm attempting to use a state handler to update a database for Flow state transitions to Running, Failed, and Successful. I need to use a Flow parameter result to make the update, which I can retrieve from the state result for Failed and Successful, but its not clear to me how to do this for the Running state, which has NoResultType

Alex Cano

07/17/2019, 10:11 PM

Hi all, not necessarily a question, but hopefully some guidance. I’ve been using Airflow for about 6 months now, and I feel like I’ve “bent” to the framework a ton. For example, manually saving data to intermediate storage, then passing the location through to the next step of the pipeline since you can’t pass data directly through. I was hoping if anyone had some best practices/tips on what to try to control in Prefect vs letting Prefect control for you. Thanks!

watching 1

Hendrik Pauthner

07/18/2019, 2:35 PM

Hey everyone! I just started having a look into Prefect and so far I am very impressed. I am just a little bit confused right now about the scheduling part and what is possible just with Prefect Core, Frankly, I am quite new to workflow tools as a whole so I don't fully understand what the differences in scheduling functionality between the Core and Cloud versions are (and whether scheduling even can be performed in a useful manner with Prefect Core). Can anybody explain and maybe provide a use case for which you would need the functionality of the Cloud version?

Joe Schmid

07/18/2019, 5:57 PM

Hi @Chris White and team (and Prefect community), at the risk of opening a can of worms, I wanted to get any thoughts or recommendations you might have on developing Prefect flows in Jupyter notebooks. Our first Prefect use cases are focused on automating some of our data science workflows. We can certainly define task functions and flows in notebooks (following Prefect example code) but it feels like there might be an opportunity to do something more tailored to data science workflows. (Specifically, I'm thinking about whether it's possible to preserve some of the interactivity of notebooks while still defining tasks and flows, e.g. be able to run a single task independent of a flow, knowing that we may need to pass it appropriate parameters, etc.) It's definitely not required for us to do anything fancy, but since we're at the start of this journey I thought I'd at least ask before we start down any particular path.

Joe Schmid

07/18/2019, 5:59 PM

(And if this risks opening up a "Joel Grus I don't like notebooks" war we can always take this offline!)

😂 1

David Ojeda

07/18/2019, 6:18 PM

Hi there, good job on your 0.6.0 version… It solved many of my cache problems that I had to work around. One comment though, now, if I have a task that is totally unrelated with a cache mechanism, the logs now have some warning pollution, for example:

Copy code

2019-07-18 20:12:28 ixion.local prefect.TaskRunner[22036] WARNING Task 'SlackTask': can't use cache because it is now invalid

… in my opinion, the message is misleading (there is no invalid cache, I have not setup any cache_for for this task) and maybe the warning level is too high for this case. While I was reading the related code:

Copy code

if self.task.cache_for is not None:
            candidate_states = prefect.context.caches.get(
                self.task.cache_key or self.task.name, []
            )
            sanitized_inputs = {key: res.value for key, res in inputs.items()}
            for candidate in candidate_states:
                if self.task.cache_validator(
                    candidate, sanitized_inputs, prefect.context.get("parameters")
                ):
                    candidate._result = candidate._result.to_result()
                    return candidate

        self.logger.warning(
            "Task '{name}': can't use cache because it "
            "is now invalid".format(
                name=prefect.context.get("task_full_name", self.task.name)
            )
        )

I wondered if the problem is just as simple as: the logging instruction was supposed to be indented one level deeper, so it happens when there 1) the cache is enabled and 2) there are cache candidates but all fail to hit.

Romain

07/19/2019, 8:17 AM

Hi everyone, I am trying to use a conditional flow with a mapped results.Here is an example of what I would like to do :

Copy code

conditions = is_true.map(input_data)
as = do_a.map(input_data)
bs = do_b.map(input_data)
ifelse(conditions, as, bs)

It does not work out because the ifelse function expect the condition to be a boolean, not a list of boolean. Is there a way to do it differently?

Jie Lou

07/19/2019, 8:08 PM

Hi everyone! I just started using Prefect and find it amazing. A quick question is: it seems like multiple assignment is not supported. For example:

Jie Lou

07/19/2019, 8:09 PM

Untitled.txt

Jie Lou

07/19/2019, 8:09 PM

Correct me if I am wrong. I wonder why this is not supported and maybe it's in the future development?