Hey all I have a use case where I need to run a pretty CPU i Prefect Community #ask-community

Hey all, I have a use case where I need to run a p...

Noam

09/04/2024, 1:24 PM

Hey all, I have a use case where I need to run a pretty CPU intensive tasks. Is it possible to run tasks on different machines? Currently I'm using k8s to run the flow.

Nate

09/04/2024, 1:29 PM

hi @Noam , yes you can use run_deployment for something like this to setup a parent-child deployment setup, where the parent kicks off the child and each have independent infra, but the parent tracks the child’s runs

Noam

09/04/2024, 1:54 PM

Cool. So if I need to pass to task some pythonic object, in this case I'll need to serialize it and pass it as deployment parameter?

Nate

09/04/2024, 2:01 PM

yep, flow parameters need to be json serializable. if your child flow has something like foo(x: int) you’d pass dict(x=42) to run_deployment’s parameters kwarg but if your flow accepts a pydantic model like class Model(BaseModel) x: int @flow def foo(model: Model) then you can pass dict(model=dict(x=42)) and the value will be coerced when it reaches the child flow you can pass

Srul Pinkas

09/04/2024, 2:02 PM

just to make sure @Nate - while it was planned only to be a flow-inner-task, and assuming this trick will take place a lot (hundreds of child-runs)- this will also create a new

run

object (spamming the dashboard / adding load to prefect-DB as an independent run) , right?

Nate

09/04/2024, 2:03 PM

@Srul Pinkas using run_deployment will create a new flow run from that deployment on whatever infrastructure that deployment is setup to use, does that help?

Srul Pinkas

09/04/2024, 2:07 PM

yes, thanks. i was just wondering what are the downsides of it - i think that if there are many "tasks" that i shift to "full deployment-run" (due to resources), it will mess with my UI dashboard and will show all those inner-calculations (rather than full-project-flows). On Airflow you could run parts of your (single) DAG on different machines, so heavier calculation could be executed on separate resources without messing up the runs-list or using huge machines for entire flow..

Nate

09/04/2024, 2:11 PM

in the UI they will appear as any other sub flow would with respect to the parent flow run, so nothing should be messed up there but yes it will be executed on a separate machine (which was the original ask in this thread)

👍 2

Devin McCabe

09/05/2024, 1:15 PM

You might want to look into https://flyte.org if you're already comfortable with k8s. Prefect (even up to v3.0 as far as I can tell) isn't a good solution when you require heterogeneous task resources.

Nate

09/05/2024, 3:15 PM

thanks for chiming in @Devin McCabe - I'd be curious to hear more of your thoughts on this > Prefect (even up to v3.0 as far as I can tell) isn't a good solution when you require heterogeneous task resources can you explain more of what you mean? e.g. both

serve

and task workers are built for static infrastructure setups

Devin McCabe

09/05/2024, 3:28 PM

You can always call out to external services inside a task, but this creates another layer of execution and collection. A first-class solution in Prefect would look more like being able to specify CPU/GPU/memory/disk/Docker image/etc. as part of the

@task

decorator, like in this Flyte example. I'm not aware of a Prefect executor that supports such a method, though. For example, you could imagine writing a simple Prefect flow that has a few tasks that do simple data munging, none of which would require much memory or special compute, but another task that fits a model needs a larger, GPU-accelerated instance.

👍 1

Srul Pinkas

09/05/2024, 3:43 PM

I agree that i was expecting something like

result = some_task.submit(input=something, memory=xx, cpu=yy, external_resources=True)

where the resources are not part of the current flow-run resources. The scenario i have requires running 500 sub-training with different inputs, and then merging the results to a single ensemble. I think

Metaflow

also supports this in their task ("step") decorator if i remember correctly. Doing this without sub-flows will require a huge machine to be used (perhaps even too big) for no reason. Using sub-flows like you mentioned @Nate, can be a good solution - but it does have some overhead (defining it as a separate flow and managing its deployment ; spamming the UI a bit with flow-runs even though they are all tasks of a parent and have no meaning of their own ; input-output json parsing for each sub-flow...).

Devin McCabe

09/05/2024, 3:45 PM

Yes, Metaflow has that feature, too. Other tools built by the genomics community (Nextflow, Cromwell) also have it out of necessity.

Srul Pinkas

09/05/2024, 3:46 PM

(At the time Metaflow did not have subflow support, so i'm not complaining! 🙂 just describing the need)

👍 1

Nate

09/05/2024, 3:54 PM

thanks for the feedback!

result = some_task.submit(input=something, memory=xx, cpu=yy, external_resources=True)

this sounds like a

TaskRunner

to me (i.e. we currently have dask, ray), it sounds like you might like to see something like a

ModalTaskRunner

that doesn't require you to engage with work pools as a means of configuring infra, but allows passing infra config at runtime (which is also possible with subflows / run_deployment via

job_variables

)

A first-class solution in Prefect would look more like being able to specify CPU/GPU/memory/disk/Docker image/etc. as part of the
@task
decorator

this is similar to how the

Flow

object worked in prefect 1, and can definitely see the value in doing something like that for tasks if anyone wants to codify a specific DX ask in an issue, please feel free!

👍 1

Devin McCabe

09/05/2024, 8:41 PM

I never saw an issue that quite described what I'm thinking of so I just wrote up this one: https://github.com/PrefectHQ/prefect/issues/15246

thank you 1

23 Views

Open in Slack

Previous Next