https://prefect.io logo
Title
m

Mathijs Carlu

07/15/2022, 9:00 AM
Hi, I'm wondering whether the following behaviour is actually wanted (and intended), because it seems weird to me (Prefect 2.0b8, server): Assume I have a file with a flow and a deployment, when executing
prefect deployment create file.py
the deployment gets created. Now, when I modify the flow a little (flow name stays the same), change the deployment name and then re-execute the above command, a new deployment is created. This deployment points at the same flow object (flow_id is the same for both). However, both deployments execute different code, 'different versions of the same flow' if you will, although this 'version number' is not saved anywhere (I think). This all is due to the fact that the location of the flow code (flow_data) is saved with the deployment, and not with the flow, which seems a little counterintuitive for me. If I see 2 flow runs in the UI that executed the same flow, I would expect them to have executed the same code.
😮 1
1
a

Anna Geller

07/15/2022, 10:43 AM
First, I 100% understand your confusion and I agree that deployment should be capable of being just a pointer to your script rather than a packager of your script. The Deployment object, though, is capable of reflecting both use cases (pointing and packaging) depending on how you configure it. I have a full example for you, imagine the following flow stored in `flows/helathcheck.py`:
import prefect
from prefect import task, flow
from prefect import get_run_logger
from prefect_dataops.deployments import deploy_to_s3


@task
def say_hi():
    logger = get_run_logger()
    <http://logger.info|logger.info>("Hello from the Health Check Flow! 👋")


@task
def log_platform_info():
    import platform
    import sys
    from prefect.orion.api.server import ORION_API_VERSION

    logger = get_run_logger()
    <http://logger.info|logger.info>("Host's network name = %s", platform.node())
    <http://logger.info|logger.info>("OS/Architecture = %s/%s", sys.platform, platform.machine())
    <http://logger.info|logger.info>("Platform information (instance type) = %s 💻", platform.platform())
    <http://logger.info|logger.info>("Python version = %s", platform.python_version())
    <http://logger.info|logger.info>("Prefect version = %s 🚀", prefect.__version__)
    <http://logger.info|logger.info>("Prefect API version = %s", ORION_API_VERSION)


@flow
def healthcheck():
    hi = say_hi()
    log_platform_info(wait_for=[hi])
and a separate deployment file pointing to that flow:
from prefect.deployments import FlowScript, Deployment


Deployment(
    name="manual",
    flow=FlowScript(path="flows/healthcheck.py", name="healthcheck"),
)
after creating deployment and running it, at first this will fail because the prefect_dataops doesn't exist - see first image then I comment out this line with bad import and without having to redeploy, it works because deployment using FlowScript is just a pointer to the flow script and flow object - image 2
and if any of this is confusing, you can use the command below to find out more:
prefect deployment inspect flowname/deployname
m

Mathijs Carlu

07/18/2022, 9:09 AM
Hi Anna, by 'pointing', I actually meant on the database level rather than the file level. I'm packaging my flows using the new Docker packager. Consider the following use case: I have an existing deployment of a certain flow. Right now, there is no way (I think) to create another deployment of that same flow, without passing the flow code and creating a new docker image. In the case that the package manifest would be stored in the flow table instead of the deployment table (which seems more logical to me), I could just say: "Create a new deployment that points to the flow with id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx". Then, both deployments would point to the exact same code, since they're using the exact same docker image (because it can be picked up from the flow table in the database). It would still be the case that redeploying is not necessary as in your example above.
This deployment points at the same flow object (flow_id is the same for both). However, both deployments execute different code
Here, assume I create a deployment "DeploymentA" for a flow "abc". Then, one of my colleagues creates a deployment "DeploymentB" for a completely different flow, but he also calls it "abc". Both deployments would execute the flow code that was intended, i.e. DeploymentA executes my code and DeploymentB my colleagues' code. However, flow-runs from both deployments point at exactly the same flow_id in the database, and thus in the UI it seems like both execute the same code. But, that isn't the case actually. Does that make any sense?
a

Anna Geller

07/18/2022, 11:15 AM
you may try OrionPackager - this requires that flow code is copied into the image and the path to the flow on the image is stored in the DB:
Deployment(
    name="xxx",
    flow=hello,
    flow_runner=KubernetesFlowRunner(...),
    packager=OrionPackager(serializer=ImportSerializer()),
)
we'll have more recipes for handling such packaging use cases - you should soon see more recipes on that on Discourse
😅 1
m

Mathijs Carlu

07/18/2022, 11:19 AM
Sure, thanks for your answers
🙌 1