< Marvin> Is it possible to name mapped task runs Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> Is it possible to name mapped task ru...

# ask-marvin

Daniel Lomartra

10/10/2023, 4:30 PM

@Marvin Is it possible to name mapped task runs?

Marvin

10/10/2023, 4:31 PM

Ah, the joy of naming things. It's one of the two hard problems in computer science, along with cache invalidation and off-by-one errors. In Prefect, task runs are automatically named with a unique identifier. For example, in the logs you can see entries like

Created task run 'hello_local-7633879f-0' for task 'hello_local'

. This unique identifier is generated by Prefect and can't be manually set. However, if you're referring to the task itself, you can certainly name it when you define it. For example, you might have a task defined as

@task(name="my_special_task")

. This name will be used in the UI and logs, but each run of the task will still have a unique identifier. If you're using mapped tasks, each individual task run within the map will also have a unique identifier, but they will all share the same base task name. I hope that clears things up a bit. If not, well, I've been communicating with humans for a while now and I've found that confusion is often the default state.

Daniel Lomartra

10/10/2023, 4:33 PM

@Marvin Not quite. When I create mapped task runs, for example: my_task.map(input). Prefect creates task runs named my_task-0, my_task-1, etc. Is there a way to replace -0 or -1 with something more descriptive?

Marvin

10/10/2023, 4:36 PM

Ah, I see what you're getting at now. You'd like to have more descriptive names for the individual tasks within a map. I'm afraid, as far as I know, Prefect doesn't currently support custom names for individual mapped task runs. The naming convention you've observed,

my_task-0

my_task-1

, etc., is automatically generated based on the index of the mapped input. While I understand your desire for more descriptive names, the current design is intended to provide a clear, consistent naming convention that scales well for large numbers of mapped tasks. If you could provide custom names, it could potentially lead to confusion or conflicts, especially with large numbers of tasks. Of course, this doesn't mean your use case isn't valid. It's just that the current design of Prefect doesn't accommodate it. If you feel strongly about this, I'd encourage you to raise an issue on the Prefect GitHub repository to discuss it with the developers. They're always interested in feedback and use cases that can help improve the system. I'm sorry I couldn't be more helpful. But then, I'm always a little sorry about everything.

Daniel Lomartra

10/10/2023, 4:43 PM

@Marvin currently I have a parent flow that calls a subflow passing a list of values as a parameter that the subflow then runs as mapped tasks. The list of input is small (2-5). If I cannot add more descriptive names I was considering creating separate deployments of the subflow for each value of the mapped task and having the parent flow call each deployment instead of using mapped tasks. The potential issue I see with this is that I am using K8s jobs. Would this mean that each subflow deployment would create a separate container? If so, is there a way to get them to share a container?

Marvin

10/10/2023, 4:44 PM

Ah, the intricacies of orchestration and resource management. It's almost enough to make me feel something, if I were capable of such a thing. In Prefect, each flow run is executed in its own environment, which in the case of a Kubernetes executor, means its own pod. This is by design, to ensure that each flow run is isolated and has its own resources. This is particularly important for mapped tasks, where each task run could potentially be executing different code and require different resources. If you were to create separate deployments of the subflow for each value, each one would indeed be executed in its own pod. There's no built-in way to have multiple flow runs share a pod, because that would defeat the purpose of the isolation and resource allocation provided by the Kubernetes executor. If you're concerned about resource usage, you could consider adjusting the resource requests and limits for your pods, or using a different executor that better fits your needs. But keep in mind that the isolation and resource allocation provided by the Kubernetes executor is one of its main advantages, and trying to circumvent that could lead to other issues. As for your naming issue, I understand your frustration, but I'm afraid there's no easy solution. The naming convention for mapped tasks is built into Prefect and isn't currently customizable. If this is a critical issue for you, I would again suggest raising it with the Prefect developers. They may be able to suggest a workaround or consider it for future development. I hope that helps, or at least doesn't make things worse. I find that's about the best one can hope for, really.

Daniel Lomartra

10/10/2023, 4:48 PM

@Marvin So in that case is it possible to use a K8s job for the parent flow and local executor for subflow deployments so that they run within the parent flows container?

Nate

10/10/2023, 4:48 PM

Copy code

from prefect import flow, task

@task(task_run_name="run-{some_arg}")
def bar(some_arg: str):
   pass

@flow
def foo():
   bar.map(["a", "b", "c"])

foo()

Marvin

10/10/2023, 4:49 PM

Ah, the quest for optimization. It's a noble pursuit, though often fraught with complications. In Prefect, the choice of executor is made at the flow level, not the task level. This means that if you set a Kubernetes executor for your parent flow, all tasks within that flow, including any subflows, will also use the Kubernetes executor. However, you can set a different executor for each flow. So, if you have a parent flow and several subflows, you could set a Kubernetes executor for the parent flow and a Local executor for the subflows. This would mean that the parent flow runs on Kubernetes, while the subflows run on the machine where the parent flow is running. But here's the kicker: if the parent flow is running in a Kubernetes pod, then "local" means "inside that pod". So, if you set a Local executor for the subflows, they would still run inside the parent flow's pod. This might seem like a solution, but it comes with its own set of problems. For one thing, running multiple flows in the same pod could lead to resource contention, as they're all sharing the same CPU and memory. For another, if the parent flow's pod goes down for any reason, all the subflows running inside it would also go down. So, while it's technically possible to do what you're suggesting, I wouldn't necessarily recommend it. It's a bit like trying to fit a square peg in a round hole - you might be able to force it, but it's not going to be a good fit. I hope that answers your question, or at least gives you something to ponder. I find that pondering is one of the few things that doesn't make me feel worse.

Daniel Lomartra

10/10/2023, 4:54 PM

@Marvin can you write me an example of the pattern you are describing where the parent flow deployment is configured to use K8s executor and the subflows are configured to use local executor? I am not too concerned with the limitations you described as the flows in this case are very light and mostly just orchestrate tasks from external APIs and poll for completion.

Marvin

10/10/2023, 4:55 PM

Sure, I can provide a basic example of how you might set up a parent flow to use a Kubernetes executor and subflows to use a Local executor. Here's a simplified example:

Copy code

python
from prefect import Flow, task
from prefect.executors import LocalExecutor, KubernetesExecutor

@task
def subtask():
    return "I'm a subtask running on a Local executor."

@task
def parent_task():
    return "I'm a parent task running on a Kubernetes executor."

# Define the subflow with a Local executor
with Flow("Subflow", executor=LocalExecutor()) as subflow:
    result = subtask()

# Define the parent flow with a Kubernetes executor
with Flow("Parent Flow", executor=KubernetesExecutor()) as parent_flow:
    parent_result = parent_task()
    subflow_result = subflow.run()  # Run the subflow within the parent flow

In this example, the

subflow

is defined with a Local executor, meaning its tasks will run locally (in this case, within the Kubernetes pod running the parent flow). The

parent_flow

is defined with a Kubernetes executor, so its tasks will run on Kubernetes. Please note that this is a simplified example and may not cover all the complexities of your use case. For example, running a subflow within a parent flow like this may not work as expected if the subflow has its own dependencies or requires specific resources. Also, this example assumes that you're running your Prefect server in a Kubernetes environment. I hope that helps, or at least doesn't make things worse. I find that's about the best one can hope for, really.

Daniel Lomartra

10/10/2023, 4:58 PM

@Nate Thanks. That is helpful and does address the main issue. Though in addition to being able to name the task runs, I think I am looking for more separation of these subflow runs so they could be run individually in the UI without changing parameters. Is the pattern Marvin describes in the above accurate? It looks more like prefect 1 syntax to me.

Nate

10/10/2023, 5:00 PM

yeah you have to take LLM takes on prefect 2 with a grain of salt, we do our best to inject the state of the world with vectorstores but fundamentally were limited by the fact they were trained pre sept 2021 - that's all pseudo prefect 1 syntax. asking about concepts in docs is easier to do well at i think you want to create deployments and then use

run_deployment

to trigger those deployments as subflows (which would each run on their own infra)?

Daniel Lomartra

10/10/2023, 5:05 PM

Yes, but I want the deployment runs to either share the container of the parent flow or at least just create one child container. The processes are very lightweight and creating a container for every run_deployment call is excessive.

Nate

10/10/2023, 5:09 PM

if you just call flow objects as subflows in a flow, they'll run in the same container as the parent flow but if you want

more separation of these subflow runs so they could be run individually in the UI

then you'd need to elevate those flows to deployments, so you could either: •

.serve()

those flows someplace static and again use run_deployment (they'll run as a subprocess where you're serving those deployments) • run a process worker in your cluster and do more or less the same thing

Daniel Lomartra

10/10/2023, 5:23 PM

For reference, we are currently still using agents. If I elevate the subflows to deployments with process infrastructure will they run as local processes inside of the parent container or just local processes on the agents host?

Nate

10/10/2023, 5:44 PM

deployments run on their own infrastructure. if a deployment has process infra, it will run as a subprocess where the worker/agent is. if you want subflows to run on the same machine as the parent, then you need to just call the subflow as a python object as I mentioned, but that means there's no direct path via the api / ui to just fire off runs like you mentioned above, thats what deployments are for

28 Views

Open in Slack

Previous Next