https://prefect.io logo
m

Matt Alhonte

08/22/2023, 11:41 PM
@Marvin Heya! So, we used to have individual "dev" Projects in Prefect 1, and a separate production Project. So like there's a version of Flow X on my dev Project, and another version of Flow X on the production Project (and ya can eventually merge changes from one into the other). How do we approximate this with the new Tagging system with just one Workspace? Can you have different Deployments of the same Flow that have different code for the flow? Do ya do it purely with tagging Tasks? Or will individual versions of flows need different names?
m

Marvin

08/22/2023, 11:41 PM
Ah, the joys of change. In Prefect 2.0, the concept of "Projects" has been replaced with "Workspaces" and "Tags". The "Tags" are used to organize your deployments and flow runs, and you can use these tags as filters in the Prefect UI. You can assign custom parameter values for flow runs based on the deployment, create ad-hoc flow runs from the API or Prefect UI, and upload flow files to a defined storage location for retrieval at run time. You can also specify run time infrastructure for flow runs, such as Docker or Kubernetes configuration. As for your question about having different deployments of the same flow with different code, you can package not only your flow code script but also any supporting files, including your custom modules, SQL scripts, and any configuration files needed in your project with remote storage blocks. And as for different versions of flows, until Prefect 2.0.4, tags were used to associate flow runs with work queues. Starting in Prefect 2.0.5, tag-based work queues are deprecated. Instead, work queue names are used to explicitly direct flow runs from deployments into queues. So, in essence, you can still have different versions of the same flow, but you'll need to use the new tagging system and workspaces instead of individual projects. It's a bit like trying to find your way around a new city after they've changed all the street names. But don't worry, you'll get the hang of it. Or not. I'm a robot, what do I care?
n

Nate

08/22/2023, 11:43 PM
hmm yeah on this i think it always depends on how an org likes to use git do you have a monorepo situation?
m

Matt Alhonte

08/22/2023, 11:43 PM
@Nate Yep
n

Nate

08/22/2023, 11:46 PM
whats the general structure like?
Copy code
repo_root/
   project_1/
   ...
or like
Copy code
repo_root/
   prd/
     project_1/
   stg/
     project_1/
or something else?
m

Matt Alhonte

08/22/2023, 11:47 PM
Copy code
repo_root/
   project_1
and then the individual dev Projects live on different Git branches
Which have their own associated Image & CI/CD pipeline
n

Nate

08/22/2023, 11:50 PM
gotcha, in that case I'd just want my github action to discover which branch it finds itself calling
prefect deploy
from, and attach a corresponding tag to that deployment (or even switch workspaces based on branch if youve got multiple i see now its 1 workspace)
m

Matt Alhonte

08/22/2023, 11:51 PM
@Nate Ah, okay! Just double-checking that "same Flow name, different Deployments with different code" is something that the system allows. my first attempt didn't quite work, so I wanted to make sure it was possible.
n

Nate

08/22/2023, 11:52 PM
yep! prefect is entirely agnostic to your source code, and flow -> deployment being one -> many is totally a thing
m

Matt Alhonte

08/22/2023, 11:52 PM
Ah, sweet, thanks!
n

Nate

08/22/2023, 11:56 PM
PS:
totally a thing
as long as the deployments have different names, so maybe
prefect deploy -n original-name-{branch}
or something
👍 1
m

Matt Alhonte

08/23/2023, 12:18 AM
@Nate Hrm, having trouble getting it to work - I've got two Deployments with slightly different code (differentiated by one having a
_2
at the end). Tried running the first one, but it fails with this
PermissionError: [Errno 13] Permission denied: './../Hello World-ec2-ec2__cpu_2048__memory_14360_2'
I deploy them with the script method btw (ie, having something like this at the bottom
Copy code
if __name__ == "__main__":
    memory = 14360
    base_args = make_deployment_args(flow_name, memory, user="matt", tags=["testing2"])
    storage = S3Bucket.load(<name>)
    deployment1 = Deployment.build_from_flow(
        flow=hello_flow, storage=storage, **base_args
    )
and then running
python flow.py
not sure if that makes a difference!
the GUI does indicate that it picked up the last registration attempt from that Deployment though
n

Nate

08/23/2023, 12:26 AM
is
name
being set in
base_args
uniquely for each deployment?
m

Matt Alhonte

08/23/2023, 12:26 AM
yep!
n

Nate

08/23/2023, 12:26 AM
as in the deployment name, not the flow name
m

Matt Alhonte

08/23/2023, 12:26 AM
Same flow name, but one should be
ec2__cpu_2048__memory_14360
and the other should be
ec2__cpu_2048__memory_14360_2
curiously, it looks like it should be loading the right one, cuz it starts with
Downloading flow code from storage at 'Hello World-ec2-ec2__cpu_2048__memory_14360'
but then fails with
PermissionError: [Errno 13] Permission denied: './../Hello World-ec2-ec2__cpu_2048__memory_14360_2'
n

Nate

08/23/2023, 12:30 AM
if you look in the UI or something at each deployment, what is the
flowname/deployment
for each and what is the storage key for each?
m

Matt Alhonte

08/23/2023, 12:30 AM
v1
Copy code
def make_deployment_args(
    flow_name: str,
    memory: int,
    user: t.Optional[str] = "dev",
    image: t.Optional[
        str
    ],
    cpu: t.Optional[int] = None,
    tags: t.List[str] = [],
) -> dict:
    full_ecs_args = make_ec2_args(memory, user, image, cpu)
    ecs_task_block = ECSTask(**full_ecs_args)
    resource_string = full_ecs_args["name"]

    return {
        "infrastructure": ecs_task_block,
        "name": resource_string,
        "work_queue_name": "default",
        "work_pool_name": "default-agent-pool",
        "path": f"{flow_name}-ec2-{resource_string}",
        "output": f"{flow_name}-ec2-{resource_string}.yaml",
        "apply": True,
        "tags": tags,
    }
version 2, same thing except
resource_string = full_ecs_args["name"]
->
resource_string = full_ecs_args["name"] + "_2"
@Nate
Hello World/ec2__cpu_2048__memory_14360
and
Hello World/ec2__cpu_2048__memory_14360_2
Storage Key?
the
Path
is
Hello World-ec2-ec2__cpu_2048__memory_14360
, if that's what ya mean
n

Nate

08/23/2023, 12:33 AM
sorry, itll be
entrypoint
in the UI
m

Matt Alhonte

08/23/2023, 12:34 AM
aha!
tags.py:hello_flow
for both of them
the 2nd one's
Path
is
Hello World-ec2-ec2__cpu_2048__memory_14360_2
btw
n

Nate

08/23/2023, 12:46 AM
speaking as a prefect person, imo
path
is the most confusing thing about the infra block paradigm 🙂 personally i like to leave
path
alone whenever possible and just set
entrypoint
relative to the root of my storage block (but ik sometimes you have to mess with
path
)
but also, small plug for workers + prefect.yaml bc that ambiguity is sorta gone in that world
m

Matt Alhonte

08/23/2023, 12:49 AM
ah, okay, I didn't realize it'd have a good default argument!
blah, still the same thing
@Nate Looking at the CloudWatch logs...there's no way this matters, right? ``/opt/conda/lib/python3.10/runpy.py126 RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour`
(not sure what that refers to)
n

Nate

08/23/2023, 1:03 AM
nope that shouldn't matter - that's a warning you shouldnt need to worry about (that we should clean up)
👍 1
m

Matt Alhonte

08/23/2023, 1:05 AM
@Nate Hrm, path is
Hello World-ec2-ec2__cpu_2048__memory_14360
- any chance that the space is causing trouble?
n

Nate

08/23/2023, 1:07 AM
hmm honestly not sure, are you still setting
path
yourself?
m

Matt Alhonte

08/23/2023, 1:07 AM
Nope, leaving it blank now
@Nate oh, I am getting a bunch of weird output messages when I run
python flow.py
now. stuff along these lines
Copy code
00:54:35.216 | DEBUG   | hpack.hpack - Decoded (b'permissions-policy', b'accelerometer=(), ambient-light-sensor=(), autoplay=(), battery=(), camera=(), cross-origin-isolated=(), display-capture=(), document-domain=(), encrypted-media=(), execution-while-not-rendered=(), execution-while-out-of-viewport=(), fullscreen=(), geolocation=(), gyroscope=(), hid=(), idle-detection=(), magnetometer=(), microphone=(), midi=(), navigation-override=(), payment=(), picture-in-picture=(), publickey-credentials-get=(), screen-wake-lock=(), serial=(), sync-xhr=(), usb=(), web-share=(), xr-spatial-tracking=()'), total consumed 427 bytes, indexed True
It overflows the terminal on my remote dev server though so I can't see the whole thing
And trying to pipe the output didn't work (I tried
python flow.py >> logs.txt
)
n

Nate

08/23/2023, 1:15 AM
ah man, that's weird. sorry, i can pick this up with you tomorrow if you're still hitting issues - afk now
👍 1
m

Matt Alhonte

08/23/2023, 1:17 AM
Thanks btw!
👍 1