Hi I have some problems understanding the concept of work po Prefect Community #prefect-getting-started

Hi, I have some problems understanding the concept...

Hen rik

12/01/2023, 10:49 AM

Hi, I have some problems understanding the concept of work pools, queues and agents. After reading the documentation several times, I now interpret this as follows: A pool defines the infrastructure and enables the execution of flows on a distributed infrastructure. Agents can be assigned to specific queues to respond to flow requests. However, the execution of flows also works just by starting the prefect server and this contains an unhealthy default work queue in the default pool. Would this already be a minimal setup? I have two virtual machines and would like to be able to run flows on both, but do I need a new pool for this, I guess I would have to add the second server to a work queue via an agent? How could I create a new pool, since I dont get any options in the UI list (e.g. docker) - do I need to create an infrastructure block? Sorry for stupid questions.

✅ 1

Jeff Hale

12/01/2023, 12:33 PM

Hi Crunch. Not stupid questions. With a new setup, forget about agents and infrastructure blocks. They are older concepts that we plan to deprecate soon (with at least a six month phase-out period). With Prefect Cloud or a Prefect server instance running in a location accessible by both VMs, you can make a single work pool in the UI. Then start a worker on each VM that polls that work pool. Leave those workers running. The worker polls the work pool it is tied too. Workers and work pools are typed by infrastructure (e.g. Docker). So if your infrastructure for your flow run is Docker in one VM and a subprocess in another VM, then you would create two work pools. When a deployment is scheduled to run, the worker will see that and kick off the flow run in your infrastructure. If you’re using a Docker work pool, the flow will run in a Docker container on the VM. Work queues can be used for prioritizing or limiting concurrent work. You may not need to think about them at all. However, if you want one VM to get more of the work, you can create two work queues in the UI and specify a priority or concurrency. Then, when you start your worker, specify that work queue in addition to the work pool. The default queue is created automatically so that many users don’t need to think about the work queue if they don’t want to. It shows as unhealthy if it hasn’t been polled by a worker recently. Does that all make sense?

Jeff Hale

12/01/2023, 12:34 PM

Diagram that might be helpful:

Jeff Hale

12/01/2023, 12:35 PM

Hen rik

12/01/2023, 1:07 PM

@Jeff Hale thanks for the detailed reply. Things are becoming much clearer now and I will test this setup. One last question. We mainly have flows that can be deployed or served while running in the background. We have one flow (preparation flow) that needs to be associated with an API call and also needs to respond with a data result once it is finished. We have found that calling a deployment within the Flask API takes too long to respond once it is ready. Currently we only call the native flow function in Flask, but have issues with crashes when they are called at the same time. What would be the best practice to archive an api bound flow with result return. I am aware that I am mixing a bit of ETL with API, but we need the dashboard information for the prepare flow for the operating units. Is it e.g. possible to assign a flow to a worker without deployment. Thank you very much!

Jeff Hale

12/01/2023, 1:31 PM

Short answer is that you need a deployment to associate a flow with worker. It sounds like you could probably use an automation that runs a deployment when an event is triggered (maybe through a webhook). It might be best to book a rubber duck session with Prefect engineer to talk through the specifics of your use case.

👀 1

Jeff Hale

12/01/2023, 2:27 PM

FYI, I just made a video of an automation that runs a deployment when an event is triggered through a webhook that should be on YouTube in a few days.

Hen rik

12/01/2023, 2:49 PM

Great , thanks for the infos. I will collect some questions and come back to the Prefect Experts.

🙌 1

Jeff Hale

12/02/2023, 12:43 PM

The video is up on YouTube.▾

👀 1

Iryna

01/08/2024, 6:04 AM

Hi @Jeff Hale , I have a question regarding worker pool type

subprocess

. As the Agent will be deprecated soon, I need to migrate it to workers. However, I am facing challenges because the new deployment approach requires a Docker image, and my current setup (VM and agent running locally) doesn't allow for Docker use due to specific reasons. I tried to use

local subprocess

type but

flow.deploy

requires docker image for deploying flows. A bit confused, why worker

subprocess

can be created but can't be used with the new deployment approach. Can you clarify what I am missing here, pls? Additionally, I would greatly appreciate any guidance or recommendations on how to navigate this challenge. Thanks.

Jeff Hale

01/08/2024, 3:16 PM

Hi Iryna! Using

flow.deploy

you can specify flow code storage on a git-based cloud option such as GitHub, Gitlab, or BitBucket or a cloud-based storage provider such as AWS S3, GCS, or GCP. In my quick check, it doesn’t look like a process block can be set in

flow.from_source

but I’m doing some more digging.

Jeff Hale

01/08/2024, 3:40 PM

Using

flow.serve

might be the best option for your use case. You can see the docs here.

Iryna

01/08/2024, 10:33 PM

Hi @Jeff Hale, thanks for the answer. My flows are scheduled and deployed in Github Actions but execution is in VM. Serve does not probably suit the need. Is there other way to execute/deploy flows with new deployment approach? Thanks

Jeff Hale

01/08/2024, 10:44 PM

Ah, if you’re using GitHub actions, you might want to store your flow code on GitHub. Would that work?

Iryna

01/08/2024, 10:51 PM

Yes, I currently store flow code in git and use github storage as a source. How can this help with my requirement? Can you explain, pls? thanks a lot

Jeff Hale

01/08/2024, 11:04 PM

Cool. You can pass

flow.from_source

the GitHub repository URL and any credentials if it’s a private repo. See examples here.

Iryna

01/08/2024, 11:23 PM

I tried this approach and get error `ValueError: Work pool 'default-work-pool' does not support custom Docker images. Please use a work pool with an

image

variable in its base job template.`

default-work-pool

is process type. my code

Copy code

cron = None if cron is None or cron == "" else (CronSchedule(cron=cron, timezone=timezone))
    storage = GitHub.load("my-repo")

    flow.from_source(
        source=storage,
        entrypoint=flow_entrypoint
    ).deploy(
        name=depl_name,
        work_pool_name=work_pool_name,
        parameters=params,
        tags=tags,
        schedule=cron,
        build=False
    )

what am I doing wrong? thanks

Jeff Hale

01/08/2024, 11:26 PM

I would put the GH url in like the example in the doc, instead of loading a GitHub storage block.

Iryna

01/08/2024, 11:34 PM

I changed the code but got same error `ValueError: Work pool 'default-work-pool' does not support custom Docker images. Please use a work pool with an

image

variable in its base job template.` code:

Copy code

flow.from_source(
        source=GitRepository(
                    url="<https://github.com/xx/my-repo.git>",
                    branch="dev",
                    credentials={
                        "access_token": Secret.load("github-personal-access-token").get()
                    }
                ),
        entrypoint=flow_entrypoint
    ).deploy(
        name=depl_name,
        work_pool_name=work_pool_name,
        parameters=params,
        tags=tags,
        schedule=cron,
        build=False
    )

Jeff Hale

01/09/2024, 3:20 AM

I can reproduce. Apologies, I was not aware that it wouldn’t work with subprocess work pool. I created a feature request issue if you want to add or follow.

👀 1

Iryna

01/09/2024, 4:03 AM

Thanks Jeff!

19 Views

Open in Slack

Previous Next