<@ULVA73B9P> i'm working within a python monorepo ...
# marvin-in-the-wild
t
@Marvin i'm working within a python monorepo using poetry, i have internal packages which i'm installing as editable installs. To deploy i deploy flows from source and install my dependencies in my docker image. But i need to run a
poetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib). Where could i intercept and override that?
m
The documentation does not seem to provide a direct solution to intercepting the code-loading process in a worker for the purpose of running
poetry install
. However, based on the information available, you may need to create a custom Docker image for your worker that contains the necessary dependencies and your internal packages. Your Dockerfile can include a step that runs
poetry install
with your configured dependencies. Here's a basic approach on how to structure your Dockerfile: 1. Use a base Python image. 2. Copy your poetry.lock and pyproject.toml into the image. 3. Install Poetry. 4. Run
poetry install
to install dependencies. 5. Copy the rest of your source code into the image. To address your specific need to point to internal packages, your
pyproject.toml
should be configured to recognize the location of these packages, which can be defined as path dependencies. Since the documentation did not contain specific instructions for your case, you might want to consider customizing the worker's Docker image as a workaround. Alternatively, further customization through subclassing the worker might be possible, but would require a deep dive into the Prefect source code, which we could attempt by searching through GitHub issues. Would you like to proceed by searching through existing GitHub issues or further discussing the Dockerfile customization?
t
@Marvin i don't want to bake this into my image, as then i have to rebuild the image everytime. I deploy flows using
.from_source
so that people can easily deploy local changes to their flow on their branch, which can involve changes to those internal packages. If i do the poetry install for my internal packages in the docker image, then i'd have to rebuild and push a big docker image everytime, i want to avoid this. I must find a way to intercept the prefect logic, where would that be?
But i need to run a
poetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib).
If you can point me to the relevant code where flows get pulled by the work pool workers that would be helpful
m
None of the GitHub issues directly address your question on intercepting the code-loading process in a Prefect work pool worker. However, a deep dive into the Prefect source code would be necessary to identify the exact point where you could perform a
poetry install
. This would involve examining the code that deals with pulling and setting up flow runs. Prefect workers are responsible for polling work queues and executing flows, so the relevant code is likely in the worker logic or the code that handles the execution of flow runs. The source code for Prefect can be found in their GitHub repository (https://github.com/PrefectHQ/prefect). Specifically, you might look into the
prefect.engine
,
prefect.infrastructure
, and
prefect.flow_runs
modules or wherever the work pool worker's logic for pulling code is located. Keep in mind that Prefect's architecture and internal code logic can be complex without proper documentation. If you consider this approach too intricate or if it risks becoming a maintenance hassle, I would recommend discussing this need with the Prefect community or maintainers—perhaps they can provide guidance or consider it for a future feature. Would you like to continue exploring the Prefect codebase for a solution, or would you prefer to reach out to the Prefect community for support?
t
help me obi wan you're my only hope
👀 1
@Nate 🙏
d
Can’t you override the job variables to install dependencies at run time?
t
thanks Daniel, could you point me to the code you're talking about overriding specifically please?
d
@Marvin How do you override job variables?
m
To override job variables in a Prefect deployment, you have the option to use a
prefect.yaml
file or the
.deploy()
method during deployment creation. Job variables are infrastructure-related values that are configurable on a work pool, and they impact how your flow run executes on your infrastructure. The
env
variable, for instance, is configurable for all work pool types and allows you to specify a set of environment variables that will be set in the runtime environment of the flow run. Here's how to override job variables using a
prefect.yaml
file:
Copy code
yaml
deployments:
- name: demo-deployment
  entrypoint: demo_project/demo_flow.py:some_work
  work_pool:
    name: local
  job_variables:
    env:
      EXECUTION_ENVIRONMENT: staging
      MY_NOT_SO_SECRET_CONFIG: plumbus
  schedule: null
And here's how you might do it programmatically using the
.deploy()
method:
Copy code
python
from prefect.deployments import DeploymentSpec

DeploymentSpec(
  name="demo-deployment",
  entrypoint="demo_project/demo_flow.py:some_work",
  work_pool_name="local",
  job_variables={
    "env": {
      "EXECUTION_ENVIRONMENT": "staging",
      "MY_NOT_SO_SECRET_CONFIG": "plumbus"
    }
  }
  # ... other deployment settings ...
)
Note that while the example above uses environment variables, the specific job variables you can override will depend on the type of work pool you are using. You can find more details in the Prefect documentation here.
t
i tried doing this at runtime, but it failed as it tried to import the code and that doesn't work until i've done a
poetry install
e.g. i tried this:
Copy code
def is_deployed_flow() -> bool:
    """
    True if the flow is running on any deployment in Prefect where we pull the source code from github.
    """
    # return config("IS_RUNNING_IN_DOCKER", cast=bool, default=False)
    return True


def setup_unitary_libs() -> None:
    """
    We only install the depdencies in the prefect docker image, we need this to setup the unitary
    packages in unitarybot so that we run the code pulled from github at runtime for people to develop flows.
    """
    try:
        result = subprocess.run(
            ["poetry", "install", "--no-root", "--without", "dev"],
            capture_output=True,
            text=True,
        )
        if result.returncode == 0:
            <http://logger.info|logger.info>("Unitary libs installed successfully")
        else:
            logger.error(
                f"Failed to install unitary libs:\n{result.stderr}:\n{result.stdout}"
            )
    except Exception as e:
        logger.error(f"Failed to install unitary libs: {e}")


def configure_flow():
    enable_loguru_support()
    enable_tqdm_logging()
    if is_deployed_flow():
        <http://logger.info|logger.info>("Setting up unitary libs")
        setup_unitary_libs()


def unitary_flow(**kwargs: Any) -> Callable[[FuncT], FuncT]:
    def decorator(func: FuncT) -> FuncT:
        @wraps(func)
        def sync_wrapped(*args: Any, **kwargs: Any) -> Any:
            configure_flow()
            return func(*args, **kwargs)

        @wraps(func)
        async def async_wrapped(*args: Any, **kwargs: Any) -> Any:
            configure_flow()
            return await cast(AsyncFuncT, func)(*args, **kwargs)

        @flow(**kwargs)
        def wrapped(*args: Any, **kwargs: Any) -> Any:
            if asyncio.iscoroutinefunction(func):
                return async_wrapped(*args, **kwargs)
            else:
                return sync_wrapped(*args, **kwargs)

        return cast(FuncT, wrapped)

    return decorator
d
Makes sense.. If I’m not mistaken, the Prefect docker image supports using the
EXTRA_PIP_PACKAGES
environment variable, so you can use the job variable override to configure that. But this will cause delay in starting every flows as it needs to install the dependency every time it starts.
👍 1
n
i need to run a
poetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib)
imo • you should do this at deployment time by installing your package into your image via Dockerfile (installing things at runtime causes weird problems often) • you can use a
pull
step to do whatever setup via the
run_shell_script
• you might be able to pass
-e
before package names with
EXTRA_PIP_PACKAGES
but again I would highly suggest installing static deps at deployment time
👍 1
t
Here's a trimmed down example of my
pyproject.toml
Copy code
[tool.poetry]
name = "unitary"
version = "0.1.0"
description = ""
package-mode = false
authors = []

[tool.poetry.dependencies]
python = ">=3.10.6,<3.12"
pandas = "2.0.1"
numpy = "^1.26.4"

[tool.poetry.group.unitary-model-store.dependencies]
unitary-model-store = { path = "libs/model-store", develop = true}

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
I have this in my Dockerfile to install the static deps:
Copy code
RUN poetry install --no-directory
which will install pandas and numpy, but not
unitary-model-store
in this example in my docker image. But i deploy flows like:
Copy code
flow.from_source(
        source=GitRepository(..., branch="feature_branch"),
        entrypoint="...",
    ).deploy(
     ...
    )
So that people can deploy flows and point to their feature branch instead of master to test flows they're developing on our work pools. If you make any changes to the
unitary-model-store
package on
feature_branch
, then that wouldn't get picked up on their branch if i already installed the external static as well as our internal libs in the docker image, so i need to do the install of our internal packages at runtime so that it points to the local code changes on e.g.
feature_branch
Do you think its not worth the complexity and just take the hit that you'd have to rebuild the image to test any of the code changes on my work pool for these libs and just install the internal libs at image build?
d
I also have the exact same setup with poetry, git, docker and k8s. I happen to use an internal package as well, which I include in the image. When I need to develop locally and change the package, it’s easy as Prefect is just python. When I deploy to the work pools I bake the package into the image, so the package doesn’t change but if the flow code changes - it’ll get pulled from the git repository.
t
Ah interesting, okay fair enough so you just accept that if you're developing some flow, and you change your internal package that the flow depends on as part of your development, that you'd have to rebuild and push the image to your docker repo for that change to be picked up, right?
d
Exactly, the internal package doesn’t change as frequently as the flow code itself, so I’m willing to create a new image every once in a while over the startup latency in flows. Also good to mention that docker images are built in layers, so even when pushing a new image, only the top layers change so it shouldn’t be more than a few megabytes.
t
Yea makes sense 👍 i think i'll just adjust my Dockerfile to install the static external deps and internal deps in two separate steps and do the same as you. I was hoping to abstract this gotcha from my ML engineers when they develop flows but i guess i can just give them the heads up about this and encourage the to run flows locally when developing mainly
👍 1