< Marvin> i m working within a python monorepo using poetry Prefect Community #marvin-in-the-wild

<@ULVA73B9P> i'm working within a python monorepo ...

Tom Matthews

04/26/2024, 12:23 PM

@Marvin i'm working within a python monorepo using poetry, i have internal packages which i'm installing as editable installs. To deploy i deploy flows from source and install my dependencies in my docker image. But i need to run a

poetry install

to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib). Where could i intercept and override that?

Marvin

04/26/2024, 12:24 PM

The documentation does not seem to provide a direct solution to intercepting the code-loading process in a worker for the purpose of running

poetry install

. However, based on the information available, you may need to create a custom Docker image for your worker that contains the necessary dependencies and your internal packages. Your Dockerfile can include a step that runs

poetry install

with your configured dependencies. Here's a basic approach on how to structure your Dockerfile: 1. Use a base Python image. 2. Copy your poetry.lock and pyproject.toml into the image. 3. Install Poetry. 4. Run

poetry install

to install dependencies. 5. Copy the rest of your source code into the image. To address your specific need to point to internal packages, your

pyproject.toml

should be configured to recognize the location of these packages, which can be defined as path dependencies. Since the documentation did not contain specific instructions for your case, you might want to consider customizing the worker's Docker image as a workaround. Alternatively, further customization through subclassing the worker might be possible, but would require a deep dive into the Prefect source code, which we could attempt by searching through GitHub issues. Would you like to proceed by searching through existing GitHub issues or further discussing the Dockerfile customization?

Tom Matthews

04/26/2024, 12:28 PM

@Marvin i don't want to bake this into my image, as then i have to rebuild the image everytime. I deploy flows using

.from_source

so that people can easily deploy local changes to their flow on their branch, which can involve changes to those internal packages. If i do the poetry install for my internal packages in the docker image, then i'd have to rebuild and push a big docker image everytime, i want to avoid this. I must find a way to intercept the prefect logic, where would that be?

But i need to run a
poetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib).

If you can point me to the relevant code where flows get pulled by the work pool workers that would be helpful

Marvin

04/26/2024, 12:29 PM

None of the GitHub issues directly address your question on intercepting the code-loading process in a Prefect work pool worker. However, a deep dive into the Prefect source code would be necessary to identify the exact point where you could perform a

poetry install

. This would involve examining the code that deals with pulling and setting up flow runs. Prefect workers are responsible for polling work queues and executing flows, so the relevant code is likely in the worker logic or the code that handles the execution of flow runs. The source code for Prefect can be found in their GitHub repository (https://github.com/PrefectHQ/prefect). Specifically, you might look into the

prefect.engine

prefect.infrastructure

, and

prefect.flow_runs

modules or wherever the work pool worker's logic for pulling code is located. Keep in mind that Prefect's architecture and internal code logic can be complex without proper documentation. If you consider this approach too intricate or if it risks becoming a maintenance hassle, I would recommend discussing this need with the Prefect community or maintainers—perhaps they can provide guidance or consider it for a future feature. Would you like to continue exploring the Prefect codebase for a solution, or would you prefer to reach out to the Prefect community for support?

Tom Matthews

04/26/2024, 12:47 PM

help me obi wan you're my only hope

👀 1

Tom Matthews

04/26/2024, 12:47 PM

@Nate 🙏

Daniel Adari

04/26/2024, 2:09 PM

Can’t you override the job variables to install dependencies at run time?

Tom Matthews

04/26/2024, 2:10 PM

thanks Daniel, could you point me to the code you're talking about overriding specifically please?

Daniel Adari

04/26/2024, 2:10 PM

@Marvin How do you override job variables?

Marvin

04/26/2024, 2:11 PM

To override job variables in a Prefect deployment, you have the option to use a

prefect.yaml

file or the

.deploy()

method during deployment creation. Job variables are infrastructure-related values that are configurable on a work pool, and they impact how your flow run executes on your infrastructure. The

env

variable, for instance, is configurable for all work pool types and allows you to specify a set of environment variables that will be set in the runtime environment of the flow run. Here's how to override job variables using a

prefect.yaml

file:

Copy code

yaml
deployments:
- name: demo-deployment
  entrypoint: demo_project/demo_flow.py:some_work
  work_pool:
    name: local
  job_variables:
    env:
      EXECUTION_ENVIRONMENT: staging
      MY_NOT_SO_SECRET_CONFIG: plumbus
  schedule: null

And here's how you might do it programmatically using the

.deploy()

method:

Copy code

python
from prefect.deployments import DeploymentSpec

DeploymentSpec(
  name="demo-deployment",
  entrypoint="demo_project/demo_flow.py:some_work",
  work_pool_name="local",
  job_variables={
    "env": {
      "EXECUTION_ENVIRONMENT": "staging",
      "MY_NOT_SO_SECRET_CONFIG": "plumbus"
    }
  }
  # ... other deployment settings ...
)

Note that while the example above uses environment variables, the specific job variables you can override will depend on the type of work pool you are using. You can find more details in the Prefect documentation here.

Tom Matthews

04/26/2024, 2:11 PM

i tried doing this at runtime, but it failed as it tried to import the code and that doesn't work until i've done a

poetry install

e.g. i tried this:

Copy code

def is_deployed_flow() -> bool:
    """
    True if the flow is running on any deployment in Prefect where we pull the source code from github.
    """
    # return config("IS_RUNNING_IN_DOCKER", cast=bool, default=False)
    return True


def setup_unitary_libs() -> None:
    """
    We only install the depdencies in the prefect docker image, we need this to setup the unitary
    packages in unitarybot so that we run the code pulled from github at runtime for people to develop flows.
    """
    try:
        result = subprocess.run(
            ["poetry", "install", "--no-root", "--without", "dev"],
            capture_output=True,
            text=True,
        )
        if result.returncode == 0:
            <http://logger.info|logger.info>("Unitary libs installed successfully")
        else:
            logger.error(
                f"Failed to install unitary libs:\n{result.stderr}:\n{result.stdout}"
            )
    except Exception as e:
        logger.error(f"Failed to install unitary libs: {e}")


def configure_flow():
    enable_loguru_support()
    enable_tqdm_logging()
    if is_deployed_flow():
        <http://logger.info|logger.info>("Setting up unitary libs")
        setup_unitary_libs()


def unitary_flow(**kwargs: Any) -> Callable[[FuncT], FuncT]:
    def decorator(func: FuncT) -> FuncT:
        @wraps(func)
        def sync_wrapped(*args: Any, **kwargs: Any) -> Any:
            configure_flow()
            return func(*args, **kwargs)

        @wraps(func)
        async def async_wrapped(*args: Any, **kwargs: Any) -> Any:
            configure_flow()
            return await cast(AsyncFuncT, func)(*args, **kwargs)

        @flow(**kwargs)
        def wrapped(*args: Any, **kwargs: Any) -> Any:
            if asyncio.iscoroutinefunction(func):
                return async_wrapped(*args, **kwargs)
            else:
                return sync_wrapped(*args, **kwargs)

        return cast(FuncT, wrapped)

    return decorator

Daniel Adari

04/26/2024, 2:19 PM

Makes sense.. If I’m not mistaken, the Prefect docker image supports using the

EXTRA_PIP_PACKAGES

environment variable, so you can use the job variable override to configure that. But this will cause delay in starting every flows as it needs to install the dependency every time it starts.

👍 1

Nate

04/26/2024, 2:22 PM

i need to run a
poetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib)

imo • you should do this at deployment time by installing your package into your image via Dockerfile (installing things at runtime causes weird problems often) • you can use a

pull

step to do whatever setup via the

run_shell_script

• you might be able to pass

-e

before package names with

EXTRA_PIP_PACKAGES

but again I would highly suggest installing static deps at deployment time

👍 1

Tom Matthews

04/26/2024, 2:33 PM

Here's a trimmed down example of my

pyproject.toml

Copy code

[tool.poetry]
name = "unitary"
version = "0.1.0"
description = ""
package-mode = false
authors = []

[tool.poetry.dependencies]
python = ">=3.10.6,<3.12"
pandas = "2.0.1"
numpy = "^1.26.4"

[tool.poetry.group.unitary-model-store.dependencies]
unitary-model-store = { path = "libs/model-store", develop = true}

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

I have this in my Dockerfile to install the static deps:

Copy code

RUN poetry install --no-directory

which will install pandas and numpy, but not

unitary-model-store

in this example in my docker image. But i deploy flows like:

Copy code

flow.from_source(
        source=GitRepository(..., branch="feature_branch"),
        entrypoint="...",
    ).deploy(
     ...
    )

So that people can deploy flows and point to their feature branch instead of master to test flows they're developing on our work pools. If you make any changes to the

unitary-model-store

package on

feature_branch

, then that wouldn't get picked up on their branch if i already installed the external static as well as our internal libs in the docker image, so i need to do the install of our internal packages at runtime so that it points to the local code changes on e.g.

feature_branch

Do you think its not worth the complexity and just take the hit that you'd have to rebuild the image to test any of the code changes on my work pool for these libs and just install the internal libs at image build?

Daniel Adari

04/26/2024, 2:38 PM

I also have the exact same setup with poetry, git, docker and k8s. I happen to use an internal package as well, which I include in the image. When I need to develop locally and change the package, it’s easy as Prefect is just python. When I deploy to the work pools I bake the package into the image, so the package doesn’t change but if the flow code changes - it’ll get pulled from the git repository.

Tom Matthews

04/26/2024, 2:42 PM

Ah interesting, okay fair enough so you just accept that if you're developing some flow, and you change your internal package that the flow depends on as part of your development, that you'd have to rebuild and push the image to your docker repo for that change to be picked up, right?

Daniel Adari

04/26/2024, 2:47 PM

Exactly, the internal package doesn’t change as frequently as the flow code itself, so I’m willing to create a new image every once in a while over the startup latency in flows. Also good to mention that docker images are built in layers, so even when pushing a new image, only the top layers change so it shouldn’t be more than a few megabytes.

Tom Matthews

04/26/2024, 2:51 PM

Yea makes sense 👍 i think i'll just adjust my Dockerfile to install the static external deps and internal deps in two separate steps and do the same as you. I was hoping to abstract this gotcha from my ML engineers when they develop flows but i guess i can just give them the heads up about this and encourage the to run flows locally when developing mainly

👍 1

32 Views

Open in Slack

Previous Next