Hello Everyone. We want to run lightweight Prefect...
# ask-community
v
Hello Everyone. We want to run lightweight Prefect flows using LocalExecutor on ECS as a separate task. It successfully creates new task and run the flow, but the problem that I have to install additional dependencies for the flow. Can't find the way how to run pip install before running the flow. Can someone help if I have any chance to implement something like this?
a
@Vadym Dytyniak you would need to build a Docker image and push it to ECR (or some other container registry). We have some tutorials and examples that you could use as a guide/template: • https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0a • This entire repo shows various deployment options incl. a Dockerfile and build commands https://github.com/anna-geller/packaging-prefect-flows/
v
We already use docker image to create new tasks in ECS, but we wanted to use just prefect image and install needed deps after that.
I found that you can specify EXTRA_PIP_PACKAGES env var, but it does not work as I see
a
I’m not sure it works the same way across all agents, I only have seen it used with a Docker storage and Docker agent:
Copy code
flow.run_config = DockerRun(env={"EXTRA_PIP_PACKAGES": "my-extra-package1 my-extra-package2"})
But even if it works with ECSRun and KubernetesRun, would you want that any time your flow starts to run it needs to first install all dependencies every time? It introduces a lot of unnecessary (expensive) latency to each run. If you instead have all your dependencies baked into an image, the only latency is pulling the image, not building it.
So for production use case, I would avoid EXTRA_PIP_PACKAGES, but you can use it if you want to. With ECSRun it would be:
Copy code
from prefect.run_configs import ECSRun

flow.run_config = ECSRun(env={"EXTRA_PIP_PACKAGES": "scikit-learn pandas"})
v
I am trying to use UneversalRun
Copy code
self.run_config = UniversalRun(
    env={
        "EXTRA_PIP_PACKAGES": "my-lib==0.1.3"
    }
)
a
do you deploy to ECS with UniversalRun?
v
yes
for me it confusing
but looks like universal run knows how to work with ECS agent
and it automatically creates new task to run the flow
a
typically, you would use ECSRun with an ECSAgent. I’d definitely recommend that. You have tons of options in terms of adding custom arguments incl. changing the image through the UI before the run: https://docs.prefect.io/api/latest/run_configs.html#ecsrun
yes, sure, UniversalRun works with all agents because it matches flows with agents by labels and optionally adds env variables, but it doesn’t allow you to override any ECS-specific run-task arguments like image to use
v
but in that case i have to specify a lot of options manually, currently UniversalRun as I see creates the same task to run the flow
oooh, let me read articles you provided to better understand all these concepts
🙌 1
thanks
a
Regarding manually: usually, if your custom dependencies don’t change too often, you don’t have to rebuild the images too often. But if they do, you could build the image automatically as part of CI/CD.
v
So, best practice to build one docker image with deps for all flows I have and create infrastructure for flows using that image?
a
correct, if all your flows need the same package dependencies, then the same image provided into ECSRun of all flows should be fine 👍
v
Thank you!
@Anna Geller we still want to have possibility to install some libraries based on the flow
How can I check if EXTRA_PIP_PACKAGES works or no?
because I see that env var is there, but can't check logs if pip was triggered
a
@Vadym Dytyniak you could use:
Copy code
from prefect.run_configs import ECSRun

flow.run_config = ECSRun(env={"EXTRA_PIP_PACKAGES": "scikit-learn pandas"})
and then check in your flow if you can import those:
Copy code
import pandas
you could e.g. test it by installing e.g. a different version of some package that you use in your base image and log the version in your flow to see of that worked.
v
I tried it, no module found
a
ok, then it doesn’t seem to work then 😄 let me open an issue for you
v
I am using LocalExecutor and trying to import in task
the only think I am using custom repo
let me try to install other dependency
a
ok, in that case it’s possible that your package couldn’t be downloaded. LMK. This should work with packages available in pip.
v
but no library installed
a
thanks for sharing, I opened an issue here https://github.com/PrefectHQ/prefect/issues/5195
v
Thank you
k
Just making sure, are you defining an ENTRYPOINT to the ECS container? The EXTRA___PIP___PACKAGES takes effect in the ENTRYPOINT here so your own entrypoint would override it I think
v
@kevinThank you. Now I am using pure prefect image and added EXTRA_PIP_PACKAGES. Flow UI just showing 'Submitted for execution...', but ECS task started and stopped and I am still see this message Submitted for execution.
removing env arg from ECSRun returns me to my previous error, but at least I receive an error
And one more issue that task created in ECS don't have logs and inactive task definition
k
You don’t get ECS logs by default for their tasks. You need to add it in the task definition with something like this:
Copy code
task_definition = yaml.safe_load(
      """
      cpu: 1024
      memory: 2048
      containerDefinitions:
      - name: flow
        logConfiguration:
          logDriver: awslogs
          options:
            awslogs-group: /ecs/prefect-flow
            awslogs-region: us-east-1
            awslogs-stream-prefix: ecs
            awslogs-create-group: 'true'
      """
  )
ECSRun( ..., task_definition=task_definition)
v
thanks
k
And then for the one that does not start, you will see logs with the errors.
Do you have custom modules btw? Cuz if you, I think you really need to make your own image.
What was your exact error? “ModuleNotFound”?
v
yes, it is custom module, but I also provide pip_extra_index_url
k
Through the
env
argument of the RunConfig?
v
yes
bad idea?
k
I think it won’t work because my understanding is the order of operations is this:
Copy code
Spin up container (prefect base image)
Run entrypoint with EXTRA_PIP_PACKAGES
Download flow from Storage
Get ready to deploy flow (includes combining the env with agent env variables)
Run flow in container
So I think for that installation of the EXTRA_PIP_PACKAGES, that env variable won’t be set because it’s not set on the container, it’s set on the Flow Run
v
it actually works
k
Ah really? If it works, all good then!
What is the latest issue with your Flow? Or are you all good now?
v
all good, thanks a lot
k
Thanks for letting me know it worked! Is your flow simple enough to share? I’m just curious to see it if you could remove sensitive info.
v
We actually have custom DaskFlow class and currently we are focused on LocalExecutor, later will text how this ECSRun works with DaskExecutor
k
Ah ok. With DaskExecutor, you have to provide an image to the cluster (if you are spinning it up). Just making sure you know a lot of Dask clusters support the EXTRA_PIP_PACKAGES too (I think we got it from them)
v
We are going to spinup using dask image and install libraries using PipInstall plugin
k
Gotcha sounds good!