Hi I m currently upgrading to version 0 14 using ECSRun conf Prefect Community #ask-community

Hi - I'm currently upgrading to version 0.14, usin...

Mark McDonald

01/21/2021, 11:53 PM

Hi - I'm currently upgrading to version 0.14, using ECSRun config and s3 storage. I used to use docker storage. With docker storage, I was able to define where my flows were located and executed from within my image. It seems like with S3 storage, I lose control over where the flow files are executed from because you all take care of downloading them into the image. From what I can tell, with s3 storage, the flows are being executed from inside of "tmp/" (example: /tmp/prefect-b0r890j3). Is my understanding of 'tmp' correct? Is there a way to override this location and have you download the flow files elsewhere? When I develop locally, at the root of my project, I have a directory called "src", where I store my flow files. Within "src" I also have a sub-directory called "helpers". Inside of "helpers" I store non-flow definition supporting code. If all my code were located in a single flow file, I wouldn't be concerned with where the flow is being executed from. However, because I'm working with helper code/files (like the example below), it's a challenge to not be able to control where the flow is executed from. Any advice on this? path = os.path.join(os.getcwd(), "helpers/query_info.yaml") with open(path, "r") as stream: data_loaded = yaml.safe_load(stream)

Zanie

01/22/2021, 12:01 AM

Hi @Mark McDonald — just wondering, why’d you switch to using S3 storage? You should be able to set your

ECSRun

image

to be an image you’ve setup to contain the additional code you require.

Mark McDonald

01/22/2021, 12:04 AM

yea - we may have to revert back to that, if this is a limitation of s3 storage. I don't recall what we thought the advantage of s3 storage was

Zanie

01/22/2021, 12:06 AM

If you version your flows and helpers separately then you can have your

ECSRun

image contain the helpers then the flow can still be stored in S3 and executed in the helper image.

Zanie

01/22/2021, 12:07 AM

There are a lot of patterns here and no clear winner yet — we’re continuously trying to assess what the best way to package flows/helpers together like this is.

Mark McDonald

01/22/2021, 12:07 AM

is it correct that with each flow run, you download the flow script from s3 to a new location in tmp? In which case it's not like I can copy my helpers into this tmp location

Mark McDonald

01/22/2021, 12:11 AM

I guess my main concern is that I want to have control over the flow execution location because I want my local development experience to mimic how it will be executed in prefect cloud. Otherwise, it's going to be confusing for my company's Prefect users

Zanie

01/22/2021, 12:12 AM

I’m not sure off the top of my head what you can override but you may be able to write a class that inherits S3Storage and add an implementation to

get_flow

that pulls down the requirements -- this is a bit of a hack though.

Zanie

01/22/2021, 12:14 AM

Ah it also looks you could provide a custom

task_definition

that runs arbitrary commands before executing the flow run

Zanie

01/22/2021, 12:17 AM

My understanding of the process: Flow registered with ECSRun type -> User runs flow in UI -> ECSAgent notices a flow is ready and pulls configuration values from the flow run config to determine what the ECS task should look like -> An ECS task is created which runs on a docker image and has an entry point of “prefect execute flow-run” and the flow run id to execute in the context -> The flow storage metadata is looked up in the ECS task -> The flow is downloaded from storage into the docker image -> The flow is executed

Zanie

01/22/2021, 12:19 AM

It seems like you could: • Store your flow in docker storage which the ECS task will use as its image instead, install your helpers there • Create a base image for all your flows to run on, install your helpers there, use any other file-based storage for your flows • Customize one of the steps to download requirements into the ECS task before the flow run is executed

Mark McDonald

01/22/2021, 12:43 AM

Thanks for the feedback. So yea, basically I do build a custom docker image that contains my helper code along with all my dependencies. I then create an ECS task definition which contains the image's repository location. I then store the flow scripts in s3 using s3 storage. The CI/CD flow kind of looks like this:

Copy code

# step 1: docker build/push/tag image
# step 2: create ecs task definition 

# step 3:
flow.storage = S3(
        bucket=S3_BUCKET,
        key=s3_key,
        stored_as_script=True,
        local_script_path=/path/to/flow.py,
    )
# step 4:
flow.run_config = ECSRun(
        task_definition_arn=task_definition_arn, run_task_kwargs=run_task_kwargs
    )
# step 5:
flow_id = flow.register(
        labels=['dev'],
        project_name=PROJECT_NAME,
    )

Basically, I think the idea of S3 Storage is appealing because if only flow code changes (not dependencies or helpers), then I can skip steps 1 and 2 during my CI/CD. I just have to call S3 storage on the single flow script that's changed, register the flow and you all take care of it from there. Subclassing S3Storage seems like it might work, but I agree that it doesn't feel right. I would imagine that other Prefect S3 storage users would want to have the ability to define the flow's location within their image as well. I think this should be configurable. Docker storage offers the configuration through the

prefect_directory

argument. Can I propose that this arg be added to s3 storage? https://github.com/PrefectHQ/prefect/blob/c8d9b9b7a6d11b9487901cd795b8f1509f355845/src/prefect/storage/docker.py#L108-L109

3 Views

Open in Slack

Previous Next