Hi all I m new here and only started with my way with Prefec Prefect Community #ask-community

Hi all! I'm new here and only started with my way ...

Sergey Goncharov

07/01/2022, 2:41 PM

Hi all! I'm new here and only started with my way with Prefect. I'd like to ask if someone could give some advice or just a documentation (I haven't found any): I'm using Prefect 2.0/beta, there I'd like to create a deployment for a flow with multiple tasks, each of them should use its own docker image. For Prefect v1 I used CreateContainer from prefect.tasks.docker library, but for Prefect v2 I cannot find any replacement. Please, any advices?

Bob Colner

07/01/2022, 3:20 PM

This isn’t going to be the company line but if you want free advice I would suggest using prefect 1.0 for production.

gratitude thank you 1

Anna Geller

07/01/2022, 4:54 PM

why would you need each task to run in a separate image? this makes things so much more difficult - e.g. you can't pass data between tasks in memory this way since each of them runs in a different process curious to hear more about your use case

Anna Geller

07/01/2022, 4:55 PM

there is orchestrator pattern + in Q3/Q4 we will work on supporting subflow-level infrastructure (I don't want to spoil too much but this is likely what you're looking for) https://discourse.prefect.io/t/how-to-create-a-flow-run-from-deployment-orchestrator-pattern/803

Anna Geller

07/01/2022, 4:57 PM

regarding Bob's advice on what's ready for production or not - I understand the sentiment but I believe everyone should consider the trade-offs and make a judgment based on your use case and your appetite for software - this post may help https://discourse.prefect.io/t/should-i-start-with-prefect-2-0-orion-skipping-prefect-1-0/544

👍 1

Sergey Goncharov

07/02/2022, 10:24 AM

@Anna Geller, thank you a lot! Seems like I picked the wrong way for that.

Sergey Goncharov

07/04/2022, 7:36 AM

The reason of why do we need to run each task on its own container is that we'd like to have a deployment with ETL like 1. extract data from sources and save it into some storage (Database or AWS S3 - here it does not matter) 2. transform the data in the storage by some patterns 3. load the transformed data into the final destination storage Though it's pretty common description, we the extract and the transform steps have pretty complex logic (sometimes with different package versions requirements) so it's much easier to put it into docker images. So the goal is not to pass a result-data between tasks, but to have one deployment which includes all required steps/tasks and to have an option to restart each task independent. Subflow here does not look like what we've been looking for.

Anna Geller

07/04/2022, 11:08 AM

It does! :) I don't want to spoil too much but already today you can deploy each subflow to a different infrastructure or container configured with the flow runner on your deployment. In the future this will get even easier since you will be able to configure that even on a per subflow level - this is all in a design phase and not ready yet but Michael is working on something really cool here. So TLDR Subflows is the solution but not yet fully ready. For now, you would need orchestrator pattern as shared in the Discourse link

2 Views

Open in Slack

Previous Next