Gotcha! Yeah I think breaking them up into different containers would do what you’re looking for here. You’d have one prefect task for each container, and each being able to run and monitored by itself. This is the current pattern we use at work as well, with saving all of our intermediate data in GCS buckets.
So in your case, if you would adopt a similar design pattern with saving the data to intermediate locations (you could delete it after the flow runs or whenever you need to), you could get similar results to your current monolith style container. I’d actually even argue you might be better off (if a transformation fails, you’d have an output of the database saved successfully and wouldn’t need to re-query the database).
However, this definitely does come with the overhead of then needing to manage those intermediate files. I think this could be relatively easy to add a task after the rest of them that deletes the intermediate files (assuming you don’t want to keep them around).
Specifically on your point of having an R task, I think you could (probably would be pretty easy to take inspiration from the
ShellTask
), but the one thing I don’t know enough about would be throwing objects between R and Python itself.