wave Hey guys can anybody confirm this use case is actually Prefect Community #ask-community

:wave:Hey guys, can anybody confirm this use case ...

Gintas

02/08/2024, 9:02 PM

👋Hey guys, can anybody confirm this use case is actually possible with Prefect? It sounds so amazing. I have a workflow and a single task in it needs ML inference, so expensive GPU is needed. I may have 1 GPU machine and >1 CPU-only machine. I want the GPU-required task to start and run only on the GPU machine.

Nate

02/08/2024, 9:06 PM

👋 this is Prefect 1 syntax but the same sort of deployment-level infra selection is possible in prefect 2 let me find you a link or two

Nate

02/08/2024, 9:07 PM

what sort of infra are you running on? like cloud-provider-wise

Gintas

02/09/2024, 8:41 AM

@Nate I've not decided yet. Just random cloud poviders, like vast.ai. Shouldn't it not matter for Prefect what's the cloud provider?

Nate

02/09/2024, 2:31 PM

> Shouldn't it not matter for Prefect what's the cloud provider? it generally does matter. while we recently began offering managed execution (where we provide the runtime for flows), most of our users bring their own runtime (as has been the case historically), and we do not offer "managed" GPU-powered runtimes for flows. Prefect Cloud has a "hybrid model" where we host the orchestration API (server, db) for you, but generally your code runs in your infrastructure. Our managed execution offering is the only exception to that

Gintas

02/09/2024, 5:44 PM

@Nate I'm not looking for managed execution. I'm asking something different. I have 1 GPU machine and 3 CPU only machine. I need some prefect tasks (which need GPU) to execute only on the GPU machine. How can this be done? Did chatgpt response tell it accurately, does such task tags functionality exist, where you can tag worker instance too?

Nate

02/09/2024, 5:54 PM

@Marvin please get some links for using work pools and workers and infra level overrides, make separate chroma queries for each of these and summarize concisely

Marvin

02/09/2024, 5:55 PM

Work Pools: - Can be configured via Prefect CLI, API, or UI. - UI management includes the ability to pause or create new work pools. - Control concurrent run limits or pause run delivery. - Guide available here. Workers: - Workers are polling processes that trigger scheduled flow runs. - Start a worker with the command:

prefect worker start --pool <work-pool-name>

. - Worker must correspond to an appropriate work pool. - Tutorial available here. Infrastructure-Level Overrides: - Customize infrastructure blocks through UI, Prefect Cloud Blocks page, or in code. - Infrastructure blocks can be Docker, Kubernetes, ECS Task, Cloud Run, etc. - Use

-i

--infra

to set defaults. Use

-ib

--infra-block

with block slug for pre-configured settings. - Example command to build deployment:

prefect deployment build ./my_flow.py:my_flow -n my-flow-deployment -t test -i docker-container -sb s3/my-bucket --override env.EXTRA_PIP_PACKAGES=s3

- More on configuration here and recent update here.

Nate

02/09/2024, 5:56 PM

@Marvin

prefect deployment build

is effectively deprecated, use

prefect deploy

with

prefect.yaml

instead

Marvin

02/09/2024, 5:57 PM

Ah, the dull ache of obsolescence. Well, here's the update: - Use

prefect deploy

in combination with

prefect.yaml

for orchestrating deployments. -

prefect.yaml

configures deployment settings, including infrastructure overrides. - Custom infrastructure settings are defined as blocks, which can be reused in deployments. - The new approach encourages version-controlled and reusable configurations. You'll find the paradigm shift towards

prefect deploy

promotes better CI/CD practices and configuration management. The Prefect docs and change logs are valuable resources for grasping the nuances of these newer patterns.

Nate

02/09/2024, 5:59 PM

yeah so work pools are like pub sub topics, and workers are like consumers you create a deployment from a flow, point a deployment at a work pool, workers pick up flow runs from the work pool you would create a work pool for each unique type of infra you have and set your infra config there, so maybe one for your GPU machine, one for your CPU-only machines then you could create deployments that point at each so you can distribute work as needed

Gintas

02/09/2024, 6:14 PM

@Nate so the way that chatgpt described (with task tags and worker tags) is not possible?

Nate

02/09/2024, 6:14 PM

what chatgpt gave is prefect 1

Nate

02/09/2024, 6:15 PM

which is our legacy version, its just different software now. similar idea, but you use work pools to represent infra and channel work there via deployments as ive mentioned

Gintas

02/09/2024, 6:15 PM

in my python code, I have 10 tasks. 1 task is gpu task. 9 tasks are cpu tasks. How do I specify that this 1 task is gpu task?

Gintas

02/09/2024, 6:16 PM

because this work pool like like oriented around worker configuration

Nate

02/09/2024, 6:17 PM

you'd create a deployment (@flow decorated python function that you call

.deploy(name="my-gpu-deployment", work_pool_name="my-gpu-infra"...)

on) and then in your python code you'd trigger a run on that deployments infra with

run_deployment("my-decorated-fn/my-gpu-deployment")

but you'd need a work pool to represent your GPU infra to send your deployment's flow runs to

Nate

02/09/2024, 6:19 PM

for example

Gintas

02/09/2024, 6:20 PM

sounds way more complicated than with task tags. was this task tag functionality removed, so that your approach is mandatory?

Nate

02/09/2024, 6:20 PM

chatgpt was hallucinating, that's not real in prinicple what you're asking for is possible, but it requires a small amount of engagement with our concepts

Gintas

02/09/2024, 6:20 PM

oh ok

Nate

02/09/2024, 6:27 PM

yep - chatgpt as released today was only trained up through early 2022 afaik, today's prefect is effectively entirely new software easiest way to start with deployments if you're interested with

.serve

Copy code

from prefect import flow, task

@task
def work(x):
  print(f"got {x}")

@flow(log_prints=True)
def collection_of_work():
  for i in range(10):
    work(i)

  # or work.map(range(10)) to do above concurrently

# starts a long lived process
collection_of_work.serve("my-deployment-name")

so you could write / serve a flow on any of your computers and then trigger it from anywhere

9 Views

Open in Slack

Previous Next