Hi - we are building bioinformatics pipelines rela...
# prefect-community
w
Hi - we are building bioinformatics pipelines related to infectious disease. Prefect looks interesting. I am wondering about task grouping (a.k.a. nesting or sub-dags). Each step in our pipeline reads inputs from GCS and writes outputs to GCS. Without task grouping, this will get messy. For example, suppose we have steps 1, 2, and 3, each of which reads one GCS input and writes a GCS output. That yields 9 tasks (3 GCS download, 3 compute, and 3 upload), but we would like to group them into pipeline steps because that’s the essential unit of work. Is there a way to model this in Prefect?
c
Hi @Walter Gillett! Apologies if I’m misunderstanding the use case, but it sounds like you only need 3 Prefect Tasks? What is the benefit you hope to achieve by “grouping” tasks without them being realized as true Prefect Tasks?
w
Hi @Chris - likely I am misunderstanding how Prefect works. Yes, I want only 3 Prefect Tasks. But if I want to use Prefect machinery to conveniently download from GCS, that's a task (prefect.tasks.google.storage.GCSDownload), same for upload, so I get 9 Tasks, yes? Conceptually there are 3 pipeline steps so I would like the workflow structure to reflect that. I am thinking of this as being like SubDAGs in Airflow (https://www.astronomer.io/guides/subdags/), where aggregating low-level details makes it possible to have a workflow with a higher level of granularity.
I see related discussion here: https://docs.prefect.io/core/PINs/PIN-05-Combining-Tasks.html and https://github.com/PrefectHQ/prefect/issues/980 . But not sure what the recommendation coming out of that is.
c
Yea, I think I understand better what you’re referring to now - thanks for that link; correct me if I’m wrong here, but the airflow notion of SubDAG is an API convenience in the UI for seeing task groupings, which makes sense. I don’t think I see any functional difference in the way the DAG behaves between the fully expanded representation and the SubDAG representation. In Prefect, you can certainly create multiple flows and then link them together using some combination of
flow.update
/
flow.set_dependencies
/
flow.root_tasks()
/
flow.terminal_tasks()
but ultimately we haven’t yet exposed an analogous first-class “sub Flow” concept
w
Thanks @Chris good to know, rolling up flows could be the answer for now. Adding a first-class subflow concept to Prefect would be helpful, but nesting adds complexity so would have to be done carefully - more is not always better. As a side note re Airflow SubDAGs from the article I linked to "Astronomer highly recommends staying away from SubDags. Airflow 1.10 has changed the default SubDag execution method to use the Sequential Executor to work around deadlocks caused by SubDags".
c
very interesting; yea I agree this seems like a really convenient abstraction - we’ll definitely look into it! I’ll actually use our bot to archive this thread as a GitHub issue that we can use to track it
@Marvin archive “First class Sub-flow concept”