Hi, I've got a question about running flows using the ECS Agent. I've read through the documentation...
w
Hi, I've got a question about running flows using the ECS Agent. I've read through the documentation and can't seem to find a way to run different tasks within a single flow using different docker images / task definitions (or with different installed dependencies). Is this not possible? My use case would be eg. a basic ETL flow where you pull from multiple services (using different installed libraries), combine the data, and write to a data warehouse or similar. Is the only option to include all required dependencies for all tasks in the flow?
k
Hey @Will, I don’t think this is possible unless you use the
CreateContainer
,
StartContainer
family of tasks in the task library. In short, you would need to start containers yourself, but I think the limitation here is getting the data in and out of these various containers. They are isolated from each other so if you have a DataFrame in one container, I don’t think another container will readily be able to use it (they are like different machines). I would imagine packaging all dependencies would be easier, but you can do something like StartContainer1 -> Persist Results1 -> Start Container2 -> Persist Results2 -> Start Container3 to Load and Combine Results1 and Results2. You can then split this up to subflows orchestrated by a main flow. Just persist in a location like S3 for other flows to grab.
w
Ok great thanks Kevin. One question for my own understanding then - between prefect tasks within a workflow, are ECS tasks reused? eg. if I initated a map step, how would that work; I'm guessing multiple tasks would be started? Or would Prefect attempt to share the work across a smaller number of tasks / a single task? (I probably need to go and read the code for the ECS agent!)
k
The ECS agent makes one “ECS Task” per Flow Run. The “Prefect Mapped Task” is a subset of the “ECS Task”. The word “task” is overloaded here lol