Hi all, I was wondering what the best practice is ...
# best-practices
v
Hi all, I was wondering what the best practice is to run a flow with S3 Storage and ECS Agent that depends on other scripts or files. Currently, I get "ModuleNotFoundError("No module named 'my-python-script'")" when I run my flow in Prefect Cloud. With Docker storage we managed to solve this by providing env_vars = {"PYTHONPATH":"$PYTHONPATH:/path/in/Docker"} and building the Docker image with all of the dependent modules. How could I achieve this with S3 Storage? We changed from Docker to S3 Storage because we would get "Cannot provide
task_definition_arn
when using
Docker
storage". We use a task definition arn because the task definition is created with Terraform. Maybe we're going about this the wrong way? Thanks in advance!
1
a
v
Thanks for the fast reply. I've been trying it out since then but I'm having trouble understanding the steps I have to take. What I've tried so far is the following: 1. create folder called "flow_utilities" 2. add all scripts to that folder 3. import flow_utilities in my-flow.py and call functions in a task 4. create setup.py and init.py 5. create dockerfile with everything 6. create docker image, tag image, push image to ecr 7. register flow and start ecs agent After all of that I still get ModuleNotFoundError("No module named 'flow_utilities'"). What am I missing? Thanks again.
a
v
Maybe I'm a bit confused with how prefect tasks and storage works. I have the following questions, 1. are task dependencies stored in Storage or only functions in code marked with \@task? 2. if only task functions code is in Storage am I supposed to manually copy my entire repo code into the Docker image used by the Agent?
a
1. the entire flow code incl. tasks if tasks are defined in the same Python script 2. you may do that, yes; there are tradeoffs in all the approaches just to call this out - I totally understand your frustration and I'm actively working on improving the process with respect to providing more official and proven deployment recipes incl. CI/CD and infrastructure deployment for agents
s
I don't if it would help you, but I can tell you how we solved something similar. We have a docker with Prefect and Dask. We create all AWS infrastracture with IaC - we use ECR to host our dockers, ECS to pull the dockers and run our cluster (includes a scheduler and hundreds of spot instances as workers). After the cluster is running it gets a public ip, we then run the Prefect flow with a DaskExecutor with the scheduler public IP. This works great - it will take you time to let all the things play together - but it is working. We have been using it for the last half year. Final note, this is all with prefect 1.0. We are now playing with Prefect 2.0. All other solutions (Storage, ECS, dask-cloud-provider) were not working for us when we tried to do similar things with them.
v
Thanks, I'd have to check out DaskExecutor to see if it would work for us. With
sys.path
I saw that "/opt/prefect" wasn't in the path so it couldn't run. I actually got it to work by adding
env={"PYTHONPATH":"$PYTHONPATH:/opt/prefect"}
to
ECSRun
and making sure that when I referenced the modules (in main.py or in my other my-module.py files) I was adding
import *flow_utilities*.my-module.py
. Really glad that it worked 😁
🙌 1