I have the code structure as follows ```main-proje...
# prefect-community
r
I have the code structure as follows
Copy code
main-project
  | 
  src
     |
     | __init__.py
     | module1.py
     | module2.py
  |
  configs
     | 
     | config.yaml
  |
  flows
     |
     | sample_flow.py
module1,2 uses configs from config.yaml file and sample_flow import the code from modules, declares them as tasks and wraps them in a flow. I have some other custom inbuilt dependencies as well. Some of the modules needs to run on GPU. I want to run this on AWS EC2. • What is the best way to package the code? • What is the way to run agent that uses GPU on EC2?
I am thinking to use dockerfile for all the dependency management, use docker as storage and then using docker agent to run the code. In this case does the docker image needs to be built with GPU support?
a
The best way to package code with custom non-Python dependencies such as your custom config files would be a Docker image since you may bake all those files into your image. If you need some examples how to do it, check: • https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0ahttps://github.com/anna-geller/packaging-prefect-flows/ For GPU question, this topic gives some hints
What is the way to run agent that uses GPU on EC2?
You would need to use a local agent and match the agent label with your flow's run configuration
r
https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0a
Thank you for this 🙏
You would need to use a local agent and match the agent label with your flow's run configuration
I am using run_config as DockerRun with the custom image. I can to use DockerAgent only right? (Failing in LocalAgent when tested in Local setup) I want to understand whether if a container is created with image provided in run_config and agent runs that. In that case, I have to build run_config image with GPU support right? In the Link you shared, there is nothing mentioned about run_config.
a
You're right - DockerRun is intended to be used with a Docker agent, not a local agent. This is an understandable source of confusion and something that will get easier in PRefect 2.0 - you can read more about that here to use GPU, it would be easiest to start a local agent on the VM with GPU resources, this way you can leverage GPU for your workflows - with Docker agent this would be more complicated as you would need a more elaborate setup to ensure your container images can leverage the GPU (installing drivers within the container image etc)
r
Yeah LocalAgent is easier than DockerAgent. Then how to make it compatible with the custom dependenices. For custom dependencies, it is easy to use DockerImage right. Is it possible to use DockerRun for dependency management and then LocalRun for using GPU?
a
For a local agent, you could package your dependencies as part of a virtual environment - e.g. you can create a virtual environment that contains all your required modules and then start your local agent within that virtual environment. If you run it with a supervisor, you can create a file called
supervisord.conf
with this content:
Copy code
[unix_http_server]
file=/tmp/supervisor.sock   

[supervisord]
loglevel=info          

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket

[program:prefect-agent]
command=/Users/YOU/opt/anaconda3/envs/your_venv/bin/prefect agent local start --label gpu
Or, if your dependencies are in a custom directory, you could start your agent from a specific project directory to help Prefect find your extra modules:
Copy code
prefect agent local start --label gpu -p /Users/your_username/path/to/your_modules
r
Makes sense. But the issue is agent is running in a different server. And custom dependencies code can keep on changing. It will be difficult to keep on updating the server env with the latest dependencies. I think the better way would be to build the docker image with GPU capability. Though it is hard, once it is built, dependencies issue will become easier which changes more often.
a
that's a valid point and you can absolutely use the docker agent + DockerRun run config to achieve that, as long as your dependencies and CUDA drivers are installed in your image
b
Sorry to hijack this thread, but I have a similar issue but in Prefect 2.0b2. I have the server and agent running in GKE, using GCS storage, have a custom docker image with all needed dependencies, and using the following deployment spec:
Copy code
DeploymentSpec(
    name="orion-test",
    flow=flow,
    flow_runner=KubernetesFlowRunner(
        image="<http://us.gcr.io/myproject/test-orion:latest|us.gcr.io/myproject/test-orion:latest>",
        namespace="orion",
        image_pull_policy=KubernetesImagePullPolicy.ALWAYS,
    ),
)
However, when I try to create the deployment it errors saying that it can’t find my custom modules:
Copy code
$ prefect deployment create test_orion/workflows/orion_test.py
Loading deployments from python script at 'test_orion/workflows/orion_test.py'...
...
from test_orion.tasks.utils import Request
...
ModuleNotFoundError: No module named 'test_orion'
Encountered exception while loading specifications from 'test_orion/workflows/orion_test.py'
Is there a github repo or other example of how to package and deploy Prefect 2.0 code with custom modules?
a
you could use something like this https://github.com/anna-geller/packaging-prefect-flows/blob/master/setup.py you can create a Python package, and then install it in your Docker image