https://prefect.io logo
Title
m

Matthias

02/21/2022, 10:42 AM
Re-introducing myself as I switched company (and no longer have access to my previous company email). Been working in the field of ML for 5 years now and following this community for the past year. Where I currently work, we are heavily invested in k8s. So for us, it made sense to chose Argo as our orchestration framework. I must admit that I like it for several reasons: • it is k8s native, so you can treat flows like any other deployment every task runs in a separate container so you can easily isolate tasks that require strict security requirements However, there are several drawbacks too. Most notably: • because every task run in a separate container, your tasks are less atomic not to waste too much time on pod creation etc. • Since it is k8s, it is really hard for devs to use it. Hence the responsibility falls into the hands of the data platform team (who manages the k8s cluster), which is the wrong thing to do imo. For these reasons, I would like to introduce Prefect as an alternative to Argo. Therefore, I really look forward to a production-ready release of Orion! 💪
👋 14
👍 2
j

Joshua Greenhalgh

02/21/2022, 10:45 AM
Out of interest is there a reason why you have decided to wait for Orion? Are there particular game changing drawbacks to the current state of Prefect?
:upvote: 1
a

Anna Geller

02/21/2022, 11:05 AM
Welcome back @Matthias! 👋 Great to hear you're looking forward to Orion. We've recently released the
KubernetesFlowRunner
- you can already deploy Orion to your Kubernetes cluster and begin your PoC! Here is a tutorial that shows how you can do that. If you have any questions along the way, we are happy to help. Also, if you want to deploy your flows to Kubernetes using Prefect 1.0, this is also very straightforward and I can point you to the right resources such as this doc (there's more!) and help.
:thank-you: 1
m

Matthias

02/21/2022, 12:10 PM
@Joshua Greenhalgh Actually, there are! The primary reason is the experience of flow-of-flows which is much better in Orion. Another big reason is that we also use kedro to define our flows and they are pretty big. So we want to use radar for visualisation
👀 1
Thanks @Anna Geller! I have already experimented with Prefect in the past and plan to do it in the future too 😄
👍 1
j

Joshua Greenhalgh

02/21/2022, 12:20 PM
@Matthias thanks for the response - never heard of kedro will have a look - so you would end up using prefect more for orchestration than structure?
The structure coming from kedro?
a

Anna Geller

02/21/2022, 12:40 PM
@Joshua Greenhalgh I think Matthias meant Argo rather than Kedro 🙂 but regarding Kedro, there's this doc explaining how to run Kedro pipeline with Prefect
j

Joshua Greenhalgh

02/21/2022, 12:43 PM
Thanks @Anna Geller - sorry for hijacking the thread - I am just trying to understand to what extent starting to build stuff out using current non-orion prefect may lead to complications down the road
I assume you will continue to maintain the current version for a while to come?
e

Evan Sutherland

02/21/2022, 12:51 PM
Welcome back @Matthias 👋 appreciate you sharing your perspectives!
a

Anna Geller

02/21/2022, 12:51 PM
@Joshua Greenhalgh We will. From the FAQ page: "Prefect 1.0 will remain fully supported by the Prefect team for at least one year after Orion's release."
j

Joshua Greenhalgh

02/21/2022, 1:18 PM
Thanks @Anna Geller - not gonna lie that scares me quite a bit!
a

Anna Geller

02/21/2022, 1:25 PM
No need to worry - look at our docs with the friendly “Don’t Panic” 🙂 As long as you use basic building blocks with no unusual customizations, migrating should be straightforward and deploying to Orion will mainly involve a change from run configs/storage to
DeploymentSpecs
, and from executors to task runners. And even if you have some complex workflow logic, we are working on providing plenty of examples for migrating various use cases to Orion. Some of them are already available here on Discourse.
:upvote: 1
j

justabill

02/21/2022, 2:22 PM
Welcome @Matthias, thanks for sharing your use case! It is absolutely a design goal of ours to make Prefect easy to use with k8s. Stay on the lookout for more progress on Orion and please keep us posted as you as you explore making the transition.
👍 1
m

Matthias

02/21/2022, 2:36 PM
@Joshua Greenhalgh: just to clarify. We are currently using kedro for structuring our pipelines in Python and Argo for orchestration. I want to switch to Prefect Orion because I believe that with Prefect Orion, we can replace both Kedro and Argo in one go! I would rather use one tool for both structure and orchestration than two separate tools as this adds unnecessary complexity. Don't get me wrong, both Kedro and Argo are great tools, but I believe that Prefect could cover our needs just fine. As a bonus (if we decide to use Prefect cloud), we no longer have to host and maintain an orchestrator (Argo) ourselves 😄.
:upvote: 3
🙏 3
💯 2
@Anna Geller: I am well aware that you can run a kedro pipeline using Prefect (I might look into that as an intermediate step in our migration process).
👍 1
e

Evan Curtin

02/21/2022, 6:16 PM
Hey I’m also an ML practitioner who used Argo extensively 🙂
👋 1
j

Jeremiah

02/21/2022, 8:03 PM
Hey there @Matthias! Welcome aboard and super excited to hear that Orion is hitting the spot!
👍 1
k

Kevin Kho

02/21/2022, 8:38 PM
I see some users using Prefect + Kedro, but honestly Kedro can be a bit constraining for not a lot of benefit. @Khuyen Tran had a demo with Hydra + Prefect and it seems like there is more synergy
👍 1
k

Khuyen Tran

02/21/2022, 8:48 PM
@Matthias I was looking for a way to integrate Kedro and Prefect since I love Kedro's parameters and data catalogs. However, the integration was quite complicated. Now I used Prefect for orchestration and Hydra for parameter management, including data catalog and parameters. You might be interested in that. Here is the demo.
🙌 2
:upvote: 2
m

Matthias

02/22/2022, 11:54 AM
@Khuyen Tran looks super interesting. I do wonder how to combine Hydra with all of Prefects features (such as different run configs, flow storage options, executors,…).
k

Khuyen Tran

02/22/2022, 3:11 PM
@Matthias yeah I figured how to use Hydra to access the config files inside a flow and a task. However, I haven't figured out how to use Hydra to configure the outputs of tasks
m

Matthias

02/22/2022, 6:41 PM
Maybe, we should take this offline, but can you provide a minimal working example? From there, we can try to extend it (e.g. adding local dask executor, see how we can use it in a flow of flows, …)
a

Anna Geller

02/22/2022, 6:44 PM
Please keep it in the thread if possible 🙂 I'm sure others can benefit from the discussion and I can post a transcript of it on Discourse in case others ask about Hydra/Kedro
👍 2
k

Khuyen Tran

02/22/2022, 6:48 PM
Hi @Matthias, yes of course. In this file, I use Hydra to access the `main.yaml` file under `config` . I then use
config.<some-parameter>
inside a flow to access a certain parameter
:upvote: 2
m

Matthias

02/22/2022, 7:45 PM
Hi @Khuyen Tran , can you also run it using the
prefect run
cmd? I also noticed there is a
main.py
, does that even work with the two hydra decorators (both in main and e.g. process_data)?
Omit as is in, remove them from the file?
k

Khuyen Tran

02/22/2022, 8:09 PM
@Matthias You will not be able to use
prefect run -p src/process_data.py
, but you can use
python src/process_data.py
. I'm still trying to figure how how to use
prefect run
with hydra
👍 1
@Matthias Let me know if you figure something out
👍 1
m

Matthias

02/25/2022, 10:53 PM
@Khuyen Tran I was playing around with Prefect and Hydra today and I found a way to use hydra with
prefect run
. Unfortunately, you do loose some of Hydra’s features such as the hydra CLI. But I figured you don’t miss that since you have the full power of Prefect at your disposal. On top of that, I was able to create a POC to use hydra to configure task outputs. The sample flow, which runs locally, can be found here: https://github.com/MatthiasRoels/prefect-examples/blob/main/hydra-integration/flow.py
❤️ 2
:upvote: 1
Compared to Kedro’s data catalog functionality, there is still a little bit of additional coding required to make it work as nice as Kedro’s, but that shouldn't be that hard I guess 😅. But on the plus-side: this should work in Orion too (depending on how task outputs will be implemented)! Anyway thanks for pointing me to this library!
k

Khuyen Tran

02/25/2022, 11:09 PM
@Matthias ah nice! I didn’t know about the
initialize
function of Hydra. Thank you for the examples. I’m trying this out. Will let you know how it goes