Hey everyone, Sorry about the noob question - I se...
# ask-community
g
Hey everyone, Sorry about the noob question - I see from the documentation that a Kubernetes flow can be configured to run in one container only (the docker image for the entire flow). Is it possible to set up a flow in which each task runs within its own container?
k
Hey @Gabi Pi, all questions are welcome.! This is not possible at the moment, you could maybe use the
RunNamespaceJob
tasks? Are you trying to avoiding bloating the size of the container?
g
Thanks for the prompt reply @Kevin Kho. Actually I am POCing Prefect in order to use it as a data pipelines orchestration tool for our team. One of the use cases is running workflows consisting of different containerised task where each task composed from its own images created by different teams with different tech stack. Is there a workaround to implement such behaviour? maybe create a flow that triggers other flows? It seems that using
RunNamespaceJob
will make the flows development a bit cumbersome...
a
Hi @Gabi Pi , it’s definitely possible to accomplish that in Prefect, it depends on how you design your flows and tasks, and how much visibility and orchestration you need. There is a documentation page that describes the Anatomy of a Prefect Task. Overall, Prefect encourages small tasks, but if you wish, your task can be an entire application packaged in a container. From a technical perspective, one possibility would be as you described a flow of flows, where each Flow can have its own dependencies packaged into a container image, and then you can orchestrate those containers from a parent flow using tasks: create_flow_run, wait_for_flow_run, get_task_run_result - those are described here, and in the API reference.
g
Thank you @Anna Geller. So in case of using your suggested approach, if I want to model 2 dependant containerised tasks I have to convert them into 6 tasks (
create_flow_run
,
wait_for_flow_run
,
get_task_run_result
) - 3 for each original step. Am I right?
a
@Gabi Pi You certainly don’t have to, but you can 🙂 • the
wait_for_flow_run
is an additional tasks if you want to do something after this task and you want to make sure this task completed before moving to the next task (if you have multiple tasks in a flow, I understood you don’t want that) •
get_task_run_result
is another additional task that you could optionally use to interact with results of this task - you don’t need it if you just want to trigger your containerized child flows. As an example, you could have one flow that triggers FlowRuns for multiple child flows in parallel. Here is how you could do that:
Copy code
from prefect import Flow, unmapped
from prefect.tasks.prefect import create_flow_run
from prefect.executors import LocalDaskExecutor


with Flow("your-flow-name", executor=LocalDaskExecutor()) as flow:
    mapped_flows = create_flow_run.map(
        flow_name=["flow_name_1", "flow_name_2", "flow_name_3"],
        project_name=unmapped("your-project-name-where-child-flows-are-registered"),
    )

if __name__ == "__main__":
    flow.run()
👍 1
g
Thank you @Anna Geller, this is really helpful. I will try to implement it as you suggested🙏
👍 1