<@ULVA73B9P> Is there a way to run a task on a sep...
# ask-marvin
a
@Marvin Is there a way to run a task on a separate kubernetes pod for concurrency? Or can you only do that type of thing through the dask or ray runners?
m
Yes, you can run tasks on separate Kubernetes pods for concurrency. Prefect provides a
KubernetesJob
infrastructure that you can use to automatically generate a manifest that runs your flow as a Kubernetes deployment. You can find more details in the Prefect Release Notes. For managing concurrency, Prefect also provides a feature called concurrency limits. You can configure the concurrency capacity within a specific concurrency limit. The rate at which these slots "decay" or refresh is controlled by the
slot_decay_per_second
parameter. Higher values for this parameter result in slots becoming available more quickly. You can read more about this in the Prefect Concurrency Limits Guide. Regarding Dask and Ray task runners, they provide an execution environment for tasks. In a flow decorator, you can specify a task runner to run the tasks called in that flow. The default task runner is the
ConcurrentTaskRunner
. To run tasks asynchronously, use the
.submit
method when you call them. More details can be found in the Prefect Dask and Ray Task Runners Guide.
a
@Marvin So, let’s say I use dask or ray, because it seems like those are the best way to run pieces of the flow on different architecture. Let’s say I have a flow that gathers and preprocesses data in tasks, and then it needs to kick off a long-running training step (either task or subflow, I don’t know) Does the pod running the original flow have to keep running the entire time training is happening to wait for it to finish and then continue on to evaluation, deployment, etc? Or what’s the best practice there?
m
In Prefect, you can use Dask or Ray task runners to manage the execution of tasks. These task runners provide an execution environment for tasks, and you can specify a task runner in a flow decorator to run the tasks called in that flow. To run tasks asynchronously, you can use the
.submit
method when you call them. For long-running tasks, such as a training step, you can use subflows. Subflows are essentially flows called from within a flow. This allows you to run multiple tasks or child flows in parallel. You can use something like
asyncio.gather
to call the flow object itself in a parent flow. The pod running the original flow does not necessarily have to keep running the entire time training is happening. Once the training task or subflow is kicked off, it can run independently. The original flow can then continue on to evaluation, deployment, etc., without waiting for the training to finish. However, the best practice would depend on your specific use case and requirements. For example, if the evaluation and deployment steps depend on the results of the training, then you would need to ensure that the training has completed before moving on to these steps. On the other hand, if these steps are independent, they can be run in parallel with the training.
a
@Marvin , the evaluation and deployment would be dependent on training finishing. So, if I use a subflow for training, I would need the parent flow to continue running for a day or so while that trains? Or is there a way to stop the parent flow, and then pick up again when training is completed?
m
In Prefect, you can use subflows for long-running tasks like training. However, if the parent flow has to wait for the subflow to complete, it would need to keep running. Currently, Prefect does not support pausing and resuming flows out of the box. However, there are a few workarounds you could consider: 1. Checkpointing: Save the state of your flow at various points, and then use these checkpoints to resume your flow from where it left off. This would involve saving the state of your tasks and their dependencies, and then reloading this state when you want to resume your flow. 2. Task Looping: Prefect supports looping within a single task. This means you could potentially design your flow such that the long-running task keeps checking for a certain condition and only completes when this condition is met. This would allow the parent flow to continue running other tasks in the meantime. 3. External Triggers: You could potentially use an external system to trigger the continuation of your flow once the long-running task is complete. This would involve integrating Prefect with another system that can monitor the status of your long-running task and trigger the next steps in your flow when ready. Remember, these are just workarounds and may not be ideal for all use cases. It's always best to design your flows and tasks in a way that they can complete within a reasonable amount of time to avoid complications.
a
@Marvin how do people typically manage their flow input parameter configurations? For example, if you have different sets of values for the input parameters that you want to easily switch between, and many parameters are shared between different flows, what are some ways to set these values, share some of them across flows, and switch between different sets of values?
m
Managing flow input parameters in Prefect can be done in a few ways: 1. Parameterization: Prefect has first-class support for parameterization, which makes it easy to run the same group of tasks in different use cases by simply passing different parameters. The parameter schema should get automatically inferred from the flow function. This allows you to easily switch between different sets of values. More Info 2. Subflows: If you have a group of tasks that run only under certain conditions, you can group them within a subflow and conditionally run the subflow rather than each task individually. This can be a way to share parameters between flows. More Info 3. Profiles: Prefect allows you to create profiles with different settings. You can pass the name of a profile to view its settings. This can be a way to manage different sets of parameters. More Info Remember to always consider the specific needs of your project when deciding how to manage your flow input parameters.