I m dipping my toe back into prefect after last playing with Prefect Community #ask-community

I'm dipping my toe back into prefect after last pl...

Christopher

10/16/2022, 2:59 PM

I'm dipping my toe back into prefect after last playing with 1.x. I'm trying to get my head around the deployment model. Previously I had a prefect agent running persistently in ECS, which would spin up new containers on demand to handle prefect jobs. I used "docker" storage to do that, which appears to not be a option currently. I can't tell if that's because it's not needed to achieve what I want to do... I'm open to using other storage options like S3, but I guess that's difficult if my flow code has external dependencies? I'm also not much of a python dev so I don't have much of an intuition on how bundling works in the python world!

✅ 1

Henning Holgersen

10/16/2022, 3:50 PM

Docker storage in prefect 1 had some clear advantages - a lot we didn’t have to think about. Using S3 storage, you basically decouple the machine from the code. I assume, in your prefect 1 setup, you have the flow code, a Dockerfile and a requirements.txt? With S3 (or similar) write your code as before, but build and push the docker image independent of that - just make sure the Dockerfile installs all the dependencies you need. The upside is that flows take a lot less time to deploy because it doesn’t need to build the image every time.

Christopher

10/16/2022, 4:12 PM

Ah I see, so build yourself a docker "environment" image, then push the flow code to S3 and when when executing the flow I guess I can specify my image as the base into which the flow code will get loaded

Henning Holgersen

10/16/2022, 5:39 PM

Exactly. In prefect 2, it is an infrastructure env “override” or a reference to an infrastructure “block”, there are a few different ways to specify it. There are some conceptual differences to prefect 1 here that I for one had to spend a little time getting comfortable with.

👍 2

Anna Geller

10/16/2022, 8:30 PM

Also check this repository template for ECS setup, it will build image for you from CI CD https://github.com/anna-geller/dataflow-ops The README has blog post and video with explanation

Christopher

10/16/2022, 8:42 PM

Looks great, thanks Anna!

🙌 1

Christopher

10/18/2022, 2:16 PM

Hi Anna, I've been digging through this today and I'm still not totally clear on what S3 is adding here. It looks like your github action is building a Docker image which includes the flow code. So what's the point in the S3 storage block?

Khuyen Tran

10/18/2022, 3:56 PM

Hi @Christopher, using an S3 storage block is one of the options to store your code. If you’d like to use the flow code stored on GitHub, you can use GitHub storage for that.

Christopher

10/18/2022, 3:56 PM

Okay, but if I'm using an ECS task (with Docker image) then I don't need to use S3?

Khuyen Tran

10/18/2022, 3:58 PM

No you don’t need to. As long as you tell the agent where to pull the data from through storage block

Christopher

10/18/2022, 4:08 PM

Okay. I'm still trying to run a hello world flow, so there's no "data" per se other than the flow code (which is embedded in the docker code). So my current understanding is that I don't need a storage block at all

Khuyen Tran

10/18/2022, 6:01 PM

Yup. Storage block is not needed if the code already exists in the infrastructure

5 Views

Open in Slack

Previous Next