Thread
#prefect-community
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    Hello Everyone. We want to run lightweight Prefect flows using LocalExecutor on ECS as a separate task. It successfully creates new task and run the flow, but the problem that I have to install additional dependencies for the flow. Can't find the way how to run pip install before running the flow. Can someone help if I have any chance to implement something like this?
    Anna Geller

    Anna Geller

    9 months ago
    @Vadym Dytyniak you would need to build a Docker image and push it to ECR (or some other container registry). We have some tutorials and examples that you could use as a guide/template: • https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0a • This entire repo shows various deployment options incl. a Dockerfile and build commands https://github.com/anna-geller/packaging-prefect-flows/
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    We already use docker image to create new tasks in ECS, but we wanted to use just prefect image and install needed deps after that.
    I found that you can specify EXTRA_PIP_PACKAGES env var, but it does not work as I see
    Anna Geller

    Anna Geller

    9 months ago
    I’m not sure it works the same way across all agents, I only have seen it used with a Docker storage and Docker agent:
    flow.run_config = DockerRun(env={"EXTRA_PIP_PACKAGES": "my-extra-package1 my-extra-package2"})
    But even if it works with ECSRun and KubernetesRun, would you want that any time your flow starts to run it needs to first install all dependencies every time? It introduces a lot of unnecessary (expensive) latency to each run. If you instead have all your dependencies baked into an image, the only latency is pulling the image, not building it.
    So for production use case, I would avoid EXTRA_PIP_PACKAGES, but you can use it if you want to. With ECSRun it would be:
    from prefect.run_configs import ECSRun
    
    flow.run_config = ECSRun(env={"EXTRA_PIP_PACKAGES": "scikit-learn pandas"})
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    I am trying to use UneversalRun
    self.run_config = UniversalRun(
        env={
            "EXTRA_PIP_PACKAGES": "my-lib==0.1.3"
        }
    )
    Anna Geller

    Anna Geller

    9 months ago
    do you deploy to ECS with UniversalRun?
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    yes
    for me it confusing
    but looks like universal run knows how to work with ECS agent
    and it automatically creates new task to run the flow
    Anna Geller

    Anna Geller

    9 months ago
    typically, you would use ECSRun with an ECSAgent. I’d definitely recommend that. You have tons of options in terms of adding custom arguments incl. changing the image through the UI before the run: https://docs.prefect.io/api/latest/run_configs.html#ecsrun
    yes, sure, UniversalRun works with all agents because it matches flows with agents by labels and optionally adds env variables, but it doesn’t allow you to override any ECS-specific run-task arguments like image to use
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    but in that case i have to specify a lot of options manually, currently UniversalRun as I see creates the same task to run the flow
    oooh, let me read articles you provided to better understand all these concepts
    thanks
    Anna Geller

    Anna Geller

    9 months ago
    Regarding manually: usually, if your custom dependencies don’t change too often, you don’t have to rebuild the images too often. But if they do, you could build the image automatically as part of CI/CD.
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    So, best practice to build one docker image with deps for all flows I have and create infrastructure for flows using that image?
    Anna Geller

    Anna Geller

    9 months ago
    correct, if all your flows need the same package dependencies, then the same image provided into ECSRun of all flows should be fine 👍
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    Thank you!
    @Anna Geller we still want to have possibility to install some libraries based on the flow
    How can I check if EXTRA_PIP_PACKAGES works or no?
    because I see that env var is there, but can't check logs if pip was triggered
    Anna Geller

    Anna Geller

    9 months ago
    @Vadym Dytyniak you could use:
    from prefect.run_configs import ECSRun
    
    flow.run_config = ECSRun(env={"EXTRA_PIP_PACKAGES": "scikit-learn pandas"})
    and then check in your flow if you can import those:
    import pandas
    you could e.g. test it by installing e.g. a different version of some package that you use in your base image and log the version in your flow to see of that worked.
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    I tried it, no module found
    Anna Geller

    Anna Geller

    9 months ago
    ok, then it doesn’t seem to work then 😄 let me open an issue for you
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    I am using LocalExecutor and trying to import in task
    the only think I am using custom repo
    let me try to install other dependency
    Anna Geller

    Anna Geller

    9 months ago
    ok, in that case it’s possible that your package couldn’t be downloaded. LMK. This should work with packages available in pip.
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    but no library installed
    Anna Geller

    Anna Geller

    9 months ago
    thanks for sharing, I opened an issue here https://github.com/PrefectHQ/prefect/issues/5195
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    Thank you
    Kevin Kho

    Kevin Kho

    9 months ago
    Just making sure, are you defining an ENTRYPOINT to the ECS container? The EXTRA___PIP___PACKAGES takes effect in the ENTRYPOINT here so your own entrypoint would override it I think
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    @kevinThank you. Now I am using pure prefect image and added EXTRA_PIP_PACKAGES. Flow UI just showing 'Submitted for execution...', but ECS task started and stopped and I am still see this message Submitted for execution.
    removing env arg from ECSRun returns me to my previous error, but at least I receive an error
    And one more issue that task created in ECS don't have logs and inactive task definition
    Kevin Kho

    Kevin Kho

    9 months ago
    You don’t get ECS logs by default for their tasks. You need to add it in the task definition with something like this:
    task_definition = yaml.safe_load(
          """
          cpu: 1024
          memory: 2048
          containerDefinitions:
          - name: flow
            logConfiguration:
              logDriver: awslogs
              options:
                awslogs-group: /ecs/prefect-flow
                awslogs-region: us-east-1
                awslogs-stream-prefix: ecs
                awslogs-create-group: 'true'
          """
      )
    ECSRun( ..., task_definition=task_definition)
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    thanks
    Kevin Kho

    Kevin Kho

    9 months ago
    And then for the one that does not start, you will see logs with the errors.
    Do you have custom modules btw? Cuz if you, I think you really need to make your own image.
    What was your exact error? “ModuleNotFound”?
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    yes, it is custom module, but I also provide pip_extra_index_url
    Kevin Kho

    Kevin Kho

    9 months ago
    Through the
    env
    argument of the RunConfig?
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    yes
    bad idea?
    Kevin Kho

    Kevin Kho

    9 months ago
    I think it won’t work because my understanding is the order of operations is this:
    Spin up container (prefect base image)
    Run entrypoint with EXTRA_PIP_PACKAGES
    Download flow from Storage
    Get ready to deploy flow (includes combining the env with agent env variables)
    Run flow in container
    So I think for that installation of the EXTRA_PIP_PACKAGES, that env variable won’t be set because it’s not set on the container, it’s set on the Flow Run
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    it actually works
    Kevin Kho

    Kevin Kho

    9 months ago
    Ah really? If it works, all good then!
    What is the latest issue with your Flow? Or are you all good now?
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    all good, thanks a lot
    Kevin Kho

    Kevin Kho

    9 months ago
    Thanks for letting me know it worked! Is your flow simple enough to share? I’m just curious to see it if you could remove sensitive info.
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    We actually have custom DaskFlow class and currently we are focused on LocalExecutor, later will text how this ECSRun works with DaskExecutor
    Kevin Kho

    Kevin Kho

    9 months ago
    Ah ok. With DaskExecutor, you have to provide an image to the cluster (if you are spinning it up). Just making sure you know a lot of Dask clusters support the EXTRA_PIP_PACKAGES too (I think we got it from them)
    Vadym Dytyniak

    Vadym Dytyniak

    9 months ago
    We are going to spinup using dask image and install libraries using PipInstall plugin
    Kevin Kho

    Kevin Kho

    9 months ago
    Gotcha sounds good!