Hi all, I really like using Prefect except for one...
# ask-community
b
Hi all, I really like using Prefect except for one missing concept, which is the orchestration of deployments. If I want to run a DAG of flows, each on its own deployment (because I want each to have its own container and deployment configuration), there is not much to help me. I need to have a parent flow that call run_deployment, track the state of each, pull and pass around results. Potentially with some asyncio on top to make it parallel when possible. I which we could use the same awesome API as for Tasks. Am I completely missing a key architecture that would solve my problem or is it just the way it is with Prefect?
👀 1
k
I hear ya on this one. I think one thing we could do to make this easier is provide some kind of utility that shortcuts a lot of the verbose code writing it takes to get a dag-like structure of deployments, so you can worry about defining your dependencies and parameters and little else. Does that feel like it'd suit what you're looking for?
b
I see two directions: 1. Extend the subflow concept to make it work with deployed flows. 2. Develop a yaml representation of the DAG, à la Argo Workflows.
k
yeah, I threw something like this together where I wrote some in-python dictionaries to define my dag of deployments, which is then sorted through and runs deployments via concurrent tasks. the outcome is nice but the interface feels very not-prefect if that makes sense
"subflow" is a confusing proposition to someone who wants a DAG and gets an in-process thing
b
In-process and sequential, without .submit(). I've felt deceived by the subflow concept. 🥲
This absence of DAG for deployments is a critical miss in my team. I'm pretty sure that Prefect will be rejected because of that, in the end. It's too bad, because on the other hand it is really nice to use for each flow. It's just that putting them together is a hassle.
k
I think I'm about to pour a nontrivial amount of energy into making this a thing
in my example, the definition looks like this:
Copy code
tasks = [
        {
            "task": "Task1",
            "depends_on": [],
            "flow": "flow1",
            "deployment": "deployment1",
            "work_pool": "work_pool1",
        },
        {
            "task": "Task2",
            "depends_on": ["Task1"],
            "flow": "flow2",
            "deployment": "deployment2",
            "work_pool": "work_pool2",
        },
        {
            "task": "Task3",
            "depends_on": ["Task1"],
            "flow": "flow3",
            "deployment": "deployment3",
            "work_pool": "work_pool3",
        },
        {
            "task": "Task4",
            "depends_on": ["Task2", "Task3"],
            "flow": "flow4",
            "deployment": "deployment4",
            "work_pool": "work_pool4",
        },
    ]
and running it gets you this. (it isn't actually calling run_deployment but let's pretend)
is that at all appealing to what you're looking for
b
It's getting there indeed. But it needs some inputs management, and ideally outputs as well.
k
yup, it's definitely missing that
but doable
b
Outputs are secondary as we can declare the output location in the input, but would be nice to have.
But you have to find another name than Task, I think! 😄
k
haha I put this together late at night
but I'll try to get a repo with a good example of it out there within a few days. how/if it actually manifests in Prefect itself is a whole other story, but I think it's good to have in case you find it useful
👍 1
b
To be honest I'm surprised that Prefect went that far without this. Made me wonder how others were doing, or if we were doing something very exotic (but no, because other orchestrator have this concept).
😂 1
k
the design of prefect encourages less constrained thinking when it comes to writing your workflows, because nothing strictly needs to be a DAG. but running things across many containers becomes less surface-level as a result 🤷
b
The alternative I identified was to use tasks running in containers via coiled functions, but that adds a bunch of complexities and dependencies that my team wouldn't like I'm sure.
k
I was actually going to suggest exactly that
imo it's pretty awesome
but you should be able to choose