Hey all, I have a use case where I need to run a p...
# ask-community
n
Hey all, I have a use case where I need to run a pretty CPU intensive tasks. Is it possible to run tasks on different machines? Currently I'm using k8s to run the flow.
n
hi @Noam , yes you can use run_deployment for something like this to setup a parent-child deployment setup, where the parent kicks off the child and each have independent infra, but the parent tracks the child’s runs
n
Cool. So if I need to pass to task some pythonic object, in this case I'll need to serialize it and pass it as deployment parameter?
n
yep, flow parameters need to be json serializable. if your child flow has something like foo(x: int) you’d pass dict(x=42) to run_deployment’s parameters kwarg but if your flow accepts a pydantic model like class Model(BaseModel) x: int @flow def foo(model: Model) then you can pass dict(model=dict(x=42)) and the value will be coerced when it reaches the child flow you can pass
s
just to make sure @Nate - while it was planned only to be a flow-inner-task, and assuming this trick will take place a lot (hundreds of child-runs)- this will also create a new
run
object (spamming the dashboard / adding load to prefect-DB as an independent run) , right?
n
@Srul Pinkas using run_deployment will create a new flow run from that deployment on whatever infrastructure that deployment is setup to use, does that help?
s
yes, thanks. i was just wondering what are the downsides of it - i think that if there are many "tasks" that i shift to "full deployment-run" (due to resources), it will mess with my UI dashboard and will show all those inner-calculations (rather than full-project-flows). On Airflow you could run parts of your (single) DAG on different machines, so heavier calculation could be executed on separate resources without messing up the runs-list or using huge machines for entire flow..
n
in the UI they will appear as any other sub flow would with respect to the parent flow run, so nothing should be messed up there but yes it will be executed on a separate machine (which was the original ask in this thread)
👍 2
d
You might want to look into https://flyte.org if you're already comfortable with k8s. Prefect (even up to v3.0 as far as I can tell) isn't a good solution when you require heterogeneous task resources.
n
thanks for chiming in @Devin McCabe - I'd be curious to hear more of your thoughts on this > Prefect (even up to v3.0 as far as I can tell) isn't a good solution when you require heterogeneous task resources can you explain more of what you mean? e.g. both
serve
and task workers are built for static infrastructure setups
d
You can always call out to external services inside a task, but this creates another layer of execution and collection. A first-class solution in Prefect would look more like being able to specify CPU/GPU/memory/disk/Docker image/etc. as part of the
@task
decorator, like in this Flyte example. I'm not aware of a Prefect executor that supports such a method, though. For example, you could imagine writing a simple Prefect flow that has a few tasks that do simple data munging, none of which would require much memory or special compute, but another task that fits a model needs a larger, GPU-accelerated instance.
👍 1
s
I agree that i was expecting something like
result = some_task.submit(input=something, memory=xx, cpu=yy, external_resources=True)
where the resources are not part of the current flow-run resources. The scenario i have requires running 500 sub-training with different inputs, and then merging the results to a single ensemble. I think
Metaflow
also supports this in their task ("step") decorator if i remember correctly. Doing this without sub-flows will require a huge machine to be used (perhaps even too big) for no reason. Using sub-flows like you mentioned @Nate, can be a good solution - but it does have some overhead (defining it as a separate flow and managing its deployment ; spamming the UI a bit with flow-runs even though they are all tasks of a parent and have no meaning of their own ; input-output json parsing for each sub-flow...).
d
Yes, Metaflow has that feature, too. Other tools built by the genomics community (Nextflow, Cromwell) also have it out of necessity.
s
(At the time Metaflow did not have subflow support, so i'm not complaining! 🙂 just describing the need)
👍 1
n
thanks for the feedback!
result = some_task.submit(input=something, memory=xx, cpu=yy, external_resources=True)
this sounds like a
TaskRunner
to me (i.e. we currently have dask, ray), it sounds like you might like to see something like a
ModalTaskRunner
that doesn't require you to engage with work pools as a means of configuring infra, but allows passing infra config at runtime (which is also possible with subflows / run_deployment via
job_variables
)
A first-class solution in Prefect would look more like being able to specify CPU/GPU/memory/disk/Docker image/etc. as part of the
@task
decorator
this is similar to how the
Flow
object worked in prefect 1, and can definitely see the value in doing something like that for tasks if anyone wants to codify a specific DX ask in an issue, please feel free!
👍 1
d
I never saw an issue that quite described what I'm thinking of so I just wrote up this one: https://github.com/PrefectHQ/prefect/issues/15246
thank you 1