Tom Matthews
04/26/2024, 12:23 PMpoetry install
to point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib). Where could i intercept and override that?Marvin
04/26/2024, 12:24 PMpoetry install
. However, based on the information available, you may need to create a custom Docker image for your worker that contains the necessary dependencies and your internal packages.
Your Dockerfile can include a step that runs poetry install
with your configured dependencies. Here's a basic approach on how to structure your Dockerfile:
1. Use a base Python image.
2. Copy your poetry.lock and pyproject.toml into the image.
3. Install Poetry.
4. Run poetry install
to install dependencies.
5. Copy the rest of your source code into the image.
To address your specific need to point to internal packages, your pyproject.toml
should be configured to recognize the location of these packages, which can be defined as path dependencies.
Since the documentation did not contain specific instructions for your case, you might want to consider customizing the worker's Docker image as a workaround. Alternatively, further customization through subclassing the worker might be possible, but would require a deep dive into the Prefect source code, which we could attempt by searching through GitHub issues.
Would you like to proceed by searching through existing GitHub issues or further discussing the Dockerfile customization?Tom Matthews
04/26/2024, 12:28 PM.from_source
so that people can easily deploy local changes to their flow on their branch, which can involve changes to those internal packages.
If i do the poetry install for my internal packages in the docker image, then i'd have to rebuild and push a big docker image everytime, i want to avoid this.
I must find a way to intercept the prefect logic, where would that be?
But i need to run aIf you can point me to the relevant code where flows get pulled by the work pool workers that would be helpfulto point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib).poetry install
Marvin
04/26/2024, 12:29 PMpoetry install
.
This would involve examining the code that deals with pulling and setting up flow runs. Prefect workers are responsible for polling work queues and executing flows, so the relevant code is likely in the worker logic or the code that handles the execution of flow runs.
The source code for Prefect can be found in their GitHub repository (https://github.com/PrefectHQ/prefect). Specifically, you might look into the prefect.engine
, prefect.infrastructure
, and prefect.flow_runs
modules or wherever the work pool worker's logic for pulling code is located.
Keep in mind that Prefect's architecture and internal code logic can be complex without proper documentation. If you consider this approach too intricate or if it risks becoming a maintenance hassle, I would recommend discussing this need with the Prefect community or maintainers—perhaps they can provide guidance or consider it for a future feature.
Would you like to continue exploring the Prefect codebase for a solution, or would you prefer to reach out to the Prefect community for support?Tom Matthews
04/26/2024, 12:47 PMTom Matthews
04/26/2024, 12:47 PMDaniel Adari
04/26/2024, 2:09 PMTom Matthews
04/26/2024, 2:10 PMDaniel Adari
04/26/2024, 2:10 PMMarvin
04/26/2024, 2:11 PMprefect.yaml
file or the .deploy()
method during deployment creation. Job variables are infrastructure-related values that are configurable on a work pool, and they impact how your flow run executes on your infrastructure. The env
variable, for instance, is configurable for all work pool types and allows you to specify a set of environment variables that will be set in the runtime environment of the flow run.
Here's how to override job variables using a prefect.yaml
file:
yaml
deployments:
- name: demo-deployment
entrypoint: demo_project/demo_flow.py:some_work
work_pool:
name: local
job_variables:
env:
EXECUTION_ENVIRONMENT: staging
MY_NOT_SO_SECRET_CONFIG: plumbus
schedule: null
And here's how you might do it programmatically using the .deploy()
method:
python
from prefect.deployments import DeploymentSpec
DeploymentSpec(
name="demo-deployment",
entrypoint="demo_project/demo_flow.py:some_work",
work_pool_name="local",
job_variables={
"env": {
"EXECUTION_ENVIRONMENT": "staging",
"MY_NOT_SO_SECRET_CONFIG": "plumbus"
}
}
# ... other deployment settings ...
)
Note that while the example above uses environment variables, the specific job variables you can override will depend on the type of work pool you are using.
You can find more details in the Prefect documentation here.Tom Matthews
04/26/2024, 2:11 PMpoetry install
e.g. i tried this:
def is_deployed_flow() -> bool:
"""
True if the flow is running on any deployment in Prefect where we pull the source code from github.
"""
# return config("IS_RUNNING_IN_DOCKER", cast=bool, default=False)
return True
def setup_unitary_libs() -> None:
"""
We only install the depdencies in the prefect docker image, we need this to setup the unitary
packages in unitarybot so that we run the code pulled from github at runtime for people to develop flows.
"""
try:
result = subprocess.run(
["poetry", "install", "--no-root", "--without", "dev"],
capture_output=True,
text=True,
)
if result.returncode == 0:
<http://logger.info|logger.info>("Unitary libs installed successfully")
else:
logger.error(
f"Failed to install unitary libs:\n{result.stderr}:\n{result.stdout}"
)
except Exception as e:
logger.error(f"Failed to install unitary libs: {e}")
def configure_flow():
enable_loguru_support()
enable_tqdm_logging()
if is_deployed_flow():
<http://logger.info|logger.info>("Setting up unitary libs")
setup_unitary_libs()
def unitary_flow(**kwargs: Any) -> Callable[[FuncT], FuncT]:
def decorator(func: FuncT) -> FuncT:
@wraps(func)
def sync_wrapped(*args: Any, **kwargs: Any) -> Any:
configure_flow()
return func(*args, **kwargs)
@wraps(func)
async def async_wrapped(*args: Any, **kwargs: Any) -> Any:
configure_flow()
return await cast(AsyncFuncT, func)(*args, **kwargs)
@flow(**kwargs)
def wrapped(*args: Any, **kwargs: Any) -> Any:
if asyncio.iscoroutinefunction(func):
return async_wrapped(*args, **kwargs)
else:
return sync_wrapped(*args, **kwargs)
return cast(FuncT, wrapped)
return decorator
Daniel Adari
04/26/2024, 2:19 PMEXTRA_PIP_PACKAGES
environment variable, so you can use the job variable override to configure that.
But this will cause delay in starting every flows as it needs to install the dependency every time it starts.Nate
04/26/2024, 2:22 PMi need to run aimo • you should do this at deployment time by installing your package into your image via Dockerfile (installing things at runtime causes weird problems often) • you can use ato point to my internal packages just after my work pool worker pulls the code and just before it loads the code (i'm guessing with importlib)poetry install
pull
step to do whatever setup via the run_shell_script
• you might be able to pass -e
before package names with EXTRA_PIP_PACKAGES
but again I would highly suggest installing static deps at deployment timeTom Matthews
04/26/2024, 2:33 PMpyproject.toml
[tool.poetry]
name = "unitary"
version = "0.1.0"
description = ""
package-mode = false
authors = []
[tool.poetry.dependencies]
python = ">=3.10.6,<3.12"
pandas = "2.0.1"
numpy = "^1.26.4"
[tool.poetry.group.unitary-model-store.dependencies]
unitary-model-store = { path = "libs/model-store", develop = true}
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
I have this in my Dockerfile to install the static deps:
RUN poetry install --no-directory
which will install pandas and numpy, but not unitary-model-store
in this example in my docker image.
But i deploy flows like:
flow.from_source(
source=GitRepository(..., branch="feature_branch"),
entrypoint="...",
).deploy(
...
)
So that people can deploy flows and point to their feature branch instead of master to test flows they're developing on our work pools.
If you make any changes to the unitary-model-store
package on feature_branch
, then that wouldn't get picked up on their branch if i already installed the external static as well as our internal libs in the docker image, so i need to do the install of our internal packages at runtime so that it points to the local code changes on e.g. feature_branch
Do you think its not worth the complexity and just take the hit that you'd have to rebuild the image to test any of the code changes on my work pool for these libs and just install the internal libs at image build?Daniel Adari
04/26/2024, 2:38 PMTom Matthews
04/26/2024, 2:42 PMDaniel Adari
04/26/2024, 2:47 PMTom Matthews
04/26/2024, 2:51 PM