< Marvin> I ve got a case where I d like one flow to call ma Prefect Community #marvin-ai

<@ULVA73B9P> I've got a case where I'd like one fl...

Javier King

07/30/2025, 2:40 PM

@Marvin I've got a case where I'd like one flow to call many others: The first flow will orchestrate and determine various parameters on a Cloud Run worker, and that will trigger several other flows to be run on workers who've much better CPU-processing capabilities. How do you recommend I do this? (Would I use, for example,

run_deployment

or is there something better?)

Marvin

07/30/2025, 2:40 PM

✅ thought for 235.4 seconds

Marvin

07/30/2025, 2:44 PM

Based on your scenario, I can recommend a solid approach. Here's how you can implement this orchestration pattern: 1. First, create your orchestrator flow that will run on Cloud Run:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment

@flow(name="orchestrator-flow")
def orchestrator_flow():
    # Determine your parameters based on your logic
    parameters = {"param1": "value1", "param2": "value2"}
    
    # Trigger multiple flow runs
    flow_run_1 = run_deployment(
        name="your-deployment/processing-flow-1",  # Format: "deployment-name/flow-name"
        parameters=parameters,
        work_queue_name="high-cpu-queue",  # Target your high-CPU workers
        as_subflow=True  # This makes the flow run tracked as a child of the orchestrator
    )
    
    flow_run_2 = run_deployment(
        name="your-deployment/processing-flow-2",
        parameters=parameters,
        work_queue_name="high-cpu-queue"
    )
    
    # You can wait for the results if needed
    return [flow_run_1, flow_run_2]

2. Create your processing flows that will run on the high-CPU workers:

Copy code

python
@flow(name="processing-flow-1")
def processing_flow_1(param1: str, param2: str):
    # Your CPU-intensive work here
    pass

@flow(name="processing-flow-2")
def processing_flow_2(param1: str, param2: str):
    # Your CPU-intensive work here
    pass

3. Deploy your flows, making sure to specify the appropriate work pools:

Copy code

python
# Deploy orchestrator to Cloud Run work pool
orchestrator_flow.from_source(
    source="your-source",
    work_pool_name="cloud-run-pool"
).deploy(name="orchestrator")

# Deploy processing flows to high-CPU work pool
processing_flow_1.from_source(
    source="your-source",
    work_pool_name="high-cpu-pool"
).deploy(name="processing-flow-1")

processing_flow_2.from_source(
    source="your-source",
    work_pool_name="high-cpu-pool"
).deploy(name="processing-flow-2")

Some key points about this approach: 1.

run_deployment

is indeed the recommended way to trigger flows from other flows in Prefect 3.x. 2. Setting

as_subflow=True

(the default) means these flows will be tracked as children of your orchestrator flow, making it easier to monitor the entire process. 3. You can use

work_queue_name

to ensure the flows run on specific workers. 4. The orchestrator flow can pass parameters to the processing flows, allowing for dynamic configuration. 5. You can wait for results and handle any errors from the child flows in your orchestrator. If you need to monitor the progress of your triggered flows, you can use the returned

FlowRun

objects to check their status or wait for completion. You might also want to consider using tags to help organize and track related flow runs. Would you like me to elaborate on any part of this solution or provide more specific examples for your use case?

Javier King

07/30/2025, 3:45 PM

@Marvin How do I tell Prefect to pull a particular image from a particular docker repo. in prefect.yaml (i don't want the image to be built, i want an already existing image to be pulled)

Marvin

07/30/2025, 3:45 PM

✅ thought for 51.0 seconds

Marvin

07/30/2025, 3:46 PM

Based on the research, here's how you can configure your

prefect.yaml

to pull an existing Docker image from a specific repository:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    
    # Specify the existing image to pull
    pull_policy: "if-not-present"  # Optional: controls when to pull the image
    image:
      name: "<http://your-registry.com/your-repo/your-image|your-registry.com/your-repo/your-image>"  # Full path to your image
      tag: "your-tag"  # e.g., "latest" or a specific version

For example, if you wanted to pull an image from Docker Hub, it might look like this:

Copy code

yaml
deployments:
  - name: ml-deployment
    entrypoint: "flows/train.py:training_flow"
    work_pool:
      name: ml-pool
    image:
      name: "tensorflow/tensorflow"  # Docker Hub image
      tag: "2.13.0-gpu"  # Specific version

Or for a private registry:

Copy code

yaml
deployments:
  - name: data-processing
    entrypoint: "flows/process.py:process_flow"
    work_pool:
      name: processing-pool
    image:
      name: "<http://gcr.io/your-project/your-custom-image|gcr.io/your-project/your-custom-image>"  # Google Container Registry example
      tag: "v1.2.3"

Important notes: 1. Make sure your worker has access to pull from the specified registry 2. For private registries, ensure proper authentication is set up on the worker 3. The

pull_policy

can be: - `"always"`: Always pull the latest version of the image - `"if-not-present"`: Only pull if the image isn't available locally - `"never"`: Never pull, only use local images Would you like me to elaborate on any part of this configuration, such as setting up authentication for private registries or configuring specific pull policies?

Javier King

07/30/2025, 3:47 PM

@Marvin and this means i don't need a "pull" step before deployments?

Marvin

07/30/2025, 3:47 PM

✅ thought for 33.3 seconds

Marvin

07/30/2025, 3:48 PM

Correct! If you're specifying the image directly in the main configuration as shown in my previous response, you do NOT need a separate pull step in the deployment steps. The image configuration in the main section is sufficient. To illustrate, here's what you DON'T need:

Copy code

yaml
deployments:
  - name: your-deployment
    entrypoint: "flow.py:flow"
    image:
      name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
      tag: "latest"
    steps:
      - pull:  # <- This is unnecessary if you've specified the image above
          image_name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
          tag: "latest"

Instead, this is all you need:

Copy code

yaml
deployments:
  - name: your-deployment
    entrypoint: "flow.py:flow"
    image:
      name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
      tag: "latest"

The image configuration in the main section tells Prefect everything it needs to know about which image to use. The worker will handle pulling the image as needed based on the pull policy (which defaults to "if-not-present" if not specified). This is actually one of the improvements in Prefect 3.x - the configuration is more streamlined and you don't need to specify the same information in multiple places. The worker will automatically handle pulling the image based on the main configuration. Said with characteristic dry wit: It's rather refreshing when things are actually simpler than they appear, isn't it? A rare occurrence in the universe of software configuration.

Javier King

07/30/2025, 3:50 PM

@Marvin What deployment steps do I absolutely need ?

Marvin

07/30/2025, 3:50 PM

✅ thought for 65.9 seconds

Marvin

07/30/2025, 3:51 PM

Here's the surprising truth: in Prefect 3.x, you don't absolutely need any deployment steps at all! The minimum valid

prefect.yaml

can be extremely simple. Here's the absolute minimum you need:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name

That's it! Everything else is optional. However, depending on your specific needs, you might want to add: 1. If using Docker:

Copy code

yaml
    image:
      name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
      tag: "latest"

2. If you need specific variables:

Copy code

yaml
    variables:
      MY_VAR: "value"

3. If you want to specify schedule:

Copy code

yaml
    schedule:
      interval: "1h"

The

steps:

section, which was more prominent in earlier versions, is now optional and typically only needed if you're doing something custom during deployment, like: - Running tests before deployment - Custom build steps - Special validation - Complex push operations With dry amusement: It's rather delightful when the minimum required configuration is actually minimal, isn't it? A refreshing departure from the usual software practice of calling something "minimal" when it requires a 47-page configuration guide. Would you like me to elaborate on any optional configurations that might be relevant for your specific use case?

Javier King

07/30/2025, 3:53 PM

@Marvin when i do that i get:

Copy code

? Your Prefect workers will need access to this flow's code in order to run it. Would you like your workers to pull your flow code from a
remote storage location when running this flow? [y/n] (y):

Marvin

07/30/2025, 3:53 PM

✅ thought for 111.4 seconds

Marvin

07/30/2025, 3:55 PM

To avoid that interactive prompt, you need to specify where your flow code will be stored. There are two main approaches: 1. If your code is in a Git repository:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    source:
      repository: <https://github.com/your-username/your-repo>
      branch: main  # or your specific branch

2. If you're using another storage location (like S3, GCS, etc.):

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    source:
      storage: remote-file-system/my-block  # reference to a configured storage block

Alternatively, you can use the

--no-prompt

flag when deploying to accept all defaults:

Copy code

bash
prefect deploy --no-prompt

With a hint of dry observation: Though I must say, answering "no" to that prompt is a bit like telling your workers to find the flow code through divine inspiration. They do need to get it from somewhere. Would you like me to: 1. Explain how to set up a specific type of remote storage (Git, S3, GCS, etc.)? 2. Show you how to configure storage blocks for your preferred storage system? 3. Explain what happens when you answer "no" to that prompt? Just let me know which aspect you'd like to explore further!

Javier King

07/30/2025, 3:55 PM

@Marvin the documentation specifies: "`pull` is the only required deployment action. It defines how Prefect pulls your deployment in remote execution environments." how do i tell prefect to pull a particular image via this pull step

Marvin

07/30/2025, 3:55 PM

✅ thought for 32.4 seconds

Marvin

07/30/2025, 3:56 PM

Ah, I see where the confusion is coming from. You're absolutely right about the documentation. Here's how to properly configure the pull step in your `prefect.yaml`:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    
    steps:
      - pull:  # This is the required pull step
          image_name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
          tag: "latest"

Or if you want to be more explicit about the pull policy:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    
    steps:
      - pull:
          image_name: "<http://your-registry.com/your-image|your-registry.com/your-image>"
          tag: "latest"
          policy: "if-not-present"  # Can be "always", "if-not-present", or "never"

For a real-world example:

Copy code

yaml
deployments:
  - name: ml-training
    entrypoint: "flows/train.py:training_flow"
    work_pool:
      name: gpu-pool
    
    steps:
      - pull:
          image_name: "tensorflow/tensorflow"
          tag: "2.13.0-gpu"
          policy: "if-not-present"

With a touch of dry wit: I apologize for my earlier response suggesting you didn't need any steps. It seems I was being a bit too optimistic about the minimalist approach. The documentation is indeed correct - we do need at least this one step to tell our workers where to find their metaphorical work clothes (the Docker image). Would you like me to explain more about: 1. The different pull policies available? 2. How to handle private registry authentication? 3. How to use variables in the pull step for more flexible configurations?

Javier King

07/30/2025, 4:01 PM

@Marvin

Copy code

Unexpected exception encountered when trying to load flow
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/prefect/deployments/steps/core.py", line 163, in run_steps
    step_output = await run_step(step, upstream_outputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/deployments/steps/core.py", line 133, in run_step
    step_func = _get_function_for_step(fqn, requires=keywords.get("requires"))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/deployments/steps/core.py", line 76, in _get_function_for_step
    step_func = import_object(fully_qualified_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/importtools.py", line 164, in import_object
    raise ValueError(
ValueError: Invalid format for object import. Received 'image_name'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/prefect/engine.py", line 112, in <module>
    flow: "Flow[..., Any]" = load_flow(flow_run)
                             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/flow_engine.py", line 140, in load_flow
    flow = run_coro_as_sync(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 207, in run_coro_as_sync
    return call.result()
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 365, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 192, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 441, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 188, in coroutine_wrapper
    return await task
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/client/utilities.py", line 69, in wrapper
    return await func(client, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/flows.py", line 2810, in load_flow_from_flow_run
    output = await run_steps(
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/deployments/steps/core.py", line 191, in run_steps
    raise StepExecutionError(f"Encountered error while running {fqn}") from exc
prefect.deployments.steps.core.StepExecutionError: Encountered error while running image_name
12:01:09 PM
prefect.flow_runs
Invalid format for object import. Received 'image_name'.

Marvin

07/30/2025, 4:03 PM

✅ thought for 124.7 seconds

Marvin

07/30/2025, 4:05 PM

Based on the research, here's the correct syntax for your `prefect.yaml`:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
    
    steps:
      - pull:
          prefect.deployments.steps.run_shell_script:
            script: |
              docker pull <http://your-registry.com/your-image:your-tag|your-registry.com/your-image:your-tag>

Or alternatively, you can specify the image directly in the work pool configuration:

Copy code

yaml
deployments:
  - name: your-deployment-name
    entrypoint: "path/to/your/flow.py:flow_name"
    work_pool:
      name: your-work-pool-name
      job_variables:
        image: <http://your-registry.com/your-image:your-tag|your-registry.com/your-image:your-tag>

For a complete example:

Copy code

yaml
deployments:
  - name: ml-training
    entrypoint: "flows/train.py:training_flow"
    work_pool:
      name: gpu-pool
      job_variables:
        image: tensorflow/tensorflow:2.13.0-gpu
    
    steps:
      - pull:
          prefect.deployments.steps.run_shell_script:
            script: |
              docker pull tensorflow/tensorflow:2.13.0-gpu

With a touch of dry wit: I apologize for the earlier syntax confusion. It seems even deployment steps need their proper formal attire - complete with fully qualified names and all. Would you like me to: 1. Explain more about how to handle private registries with this syntax? 2. Show how to use variables to make the configuration more flexible? 3. Explain the difference between specifying the image in

job_variables

vs. using a pull step?

Javier King

07/30/2025, 10:31 PM

@Marvin is

run_deployment

synchronous? i.e., if i have 4 workers waiting, will run_deployment be called enough times so each of them picks up a flow ?

Marvin

07/30/2025, 10:31 PM

✅ thought for 65.2 seconds

Marvin

07/30/2025, 10:32 PM

Here's the key behavior of `run_deployment`: 1. By default,

run_deployment

is synchronous in terms of SUBMITTING the flow run - it will wait until the flow run is created and queued. However, it does NOT wait for the flow run to complete by default. 2. If you want to wait for the flow run to complete, you need to use the returned

FlowRun

object and wait for it. Here's how you can handle both scenarios:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment

@flow
def orchestrator_flow():
    # Non-waiting approach - will fire off all runs immediately
    flow_runs = []
    for i in range(4):
        flow_run = run_deployment(
            name="your-deployment/flow-name",
            parameters={"param": i},
            work_queue_name="your-queue"
        )
        flow_runs.append(flow_run)
    
    # All 4 flows are now queued and will be picked up by available workers
    print("All flows queued!")
    
    # If you want to wait for completion:
    for flow_run in flow_runs:
        final_state = flow_run.wait_for_completion()
        print(f"Flow run {flow_run.id} completed with state: {final_state}")

Or if you're using async:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment
import asyncio

@flow
async def async_orchestrator_flow():
    # Create all flow runs concurrently
    flow_runs = await asyncio.gather(*[
        run_deployment(
            name="your-deployment/flow-name",
            parameters={"param": i},
            work_queue_name="your-queue"
        ) for i in range(4)
    ])
    
    # If you want to wait for completion:
    results = await asyncio.gather(*[
        flow_run.wait_for_completion() 
        for flow_run in flow_runs
    ])

Key points: 1. The submission of flow runs is independent of worker availability 2. If you have 4 workers waiting, they will pick up the flow runs as soon as they're submitted 3. The workers will process the flows concurrently if resources allow 4. You can control whether your orchestrator waits for completion using

wait_for_completion()

With characteristic dry wit: Rather like a particularly efficient postal service - it'll guarantee delivery of your letters to the post office, but whether you want to wait around to see if they actually reach their destination is entirely up to you. Would you like me to: 1. Show how to handle timeouts and error cases? 2. Explain how to monitor the status of multiple concurrent flow runs? 3. Demonstrate how to control the concurrency level of the workers?

Javier King

07/31/2025, 2:13 AM

@Marvin are you sure?

Copy code

flow_run = run_deployment(
  499             name="binanapocket/Binana-a-Pocket",
  500             parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
  501             as_subflow=True
  502         )

just runs one at a time

Marvin

07/31/2025, 2:13 AM

✅ thought for 52.5 seconds

Marvin

07/31/2025, 2:14 AM

I apologize for my previous incorrect response. You are absolutely right, and I was wrong. When using

run_deployment

with

as_subflow=True

(which is the default), it WILL wait for the flow to complete before moving on to the next line. This means your code will execute sequentially, one flow at a time. If you want to run multiple flows concurrently, you have a few options: 1. Set

as_subflow=False

to run them independently:

Copy code

python
flow_runs = []
for uid, pid, pocket_n in parameters:
    flow_run = run_deployment(
        name="binanapocket/Binana-a-Pocket",
        parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
        as_subflow=False  # This will not wait for completion
    )
    flow_runs.append(flow_run)

2. Use async/await pattern to run them concurrently:

Copy code

python
from prefect import flow
import asyncio

@flow
async def orchestrator_flow():
    # Run all flows concurrently
    flow_runs = await asyncio.gather(*[
        run_deployment(
            name="binanapocket/Binana-a-Pocket",
            parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
            as_subflow=True
        )
        for uid, pid, pocket_n in parameters
    ])

3. Use concurrent.futures for parallel execution in synchronous code:

Copy code

python
from concurrent.futures import ThreadPoolExecutor
from functools import partial

@flow
def orchestrator_flow():
    run_func = partial(
        run_deployment,
        name="binanapocket/Binana-a-Pocket",
        as_subflow=False
    )
    
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = [
            executor.submit(
                run_func,
                parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n}
            )
            for uid, pid, pocket_n in parameters
        ]
        results = [f.result() for f in futures]

With appropriately sheepish wit: I suppose this is what I get for being too optimistic about concurrent execution. Rather like assuming multiple postal workers will deliver packages faster when they're all waiting in line to use the same bicycle. Would you like me to: 1. Explain the tradeoffs between these different approaches? 2. Show how to monitor the status of flows when running them in parallel? 3. Demonstrate error handling for concurrent flow execution?

Javier King

07/31/2025, 2:15 AM

@Marvin Does "subflow=False" prevent the subflows from showing up on the UI?

Marvin

07/31/2025, 2:15 AM

✅ thought for 84.4 seconds

Marvin

07/31/2025, 2:17 AM

No,

as_subflow=False

does NOT prevent the flows from showing up in the UI. The flows will still appear in the UI, but there are important differences in how they're displayed: 1. With

as_subflow=True

(default): - Flows appear as nested/hierarchical in the UI - You can see the parent-child relationship between flows - The subflows appear within the parent flow's run details - Easier to track the relationship between flows 2. With `as_subflow=False`: - Flows appear as independent flow runs in the UI - No visible connection to the parent flow that triggered them - Each flow run stands alone at the root level - You'll see them as separate entries in your flow runs list With a touch of dry wit: Rather like the difference between having your family tree and having a list of people who may or may not be related to each other. The people are still there either way, you just lose track of who begat whom. This means you can still: - Monitor all flow runs in the UI - See their statuses and logs - Track their execution - Access their results The main difference is in the visualization of relationships and the organizational structure in the UI. Would you like me to: 1. Show how to track related flows when using

as_subflow=False

? 2. Explain strategies for organizing and monitoring parallel flow runs? 3. Demonstrate how to add custom tags or identifiers to help group related flows?

Javier King

07/31/2025, 2:18 AM

is there a way to use subflow=True and still be asynchronous in their rexecution? (so 4 workers can pick up 4 flows)

Javier King

07/31/2025, 2:20 AM

@Marvin is there a way to use subflow=True and still be asynchronous in their rexecution? (so 4 workers can pick up 4 flows)

Marvin

07/31/2025, 2:20 AM

✅ thought for 46.4 seconds

Marvin

07/31/2025, 2:20 AM

Yes! There are a couple of ways to achieve concurrent execution while maintaining the subflow relationship (

as_subflow=True

). Here's how: 1. Using `asyncio.gather`:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment
import asyncio

@flow
async def orchestrator_flow(parameters_list):
    # This will run all subflows concurrently while maintaining subflow relationships
    results = await asyncio.gather(*[
        run_deployment(
            name="binanapocket/Binana-a-Pocket",
            parameters=params,
            as_subflow=True  # Maintains subflow relationship
        )
        for params in parameters_list
    ])
    return results

# Usage:
parameters_list = [
    {"uid": uid1, "pid": pid1, "pocket_N": n1},
    {"uid": uid2, "pid": pid2, "pocket_N": n2},
    # ... more parameter sets
]
flow_result = await orchestrator_flow(parameters_list)

2. Using `asyncio.create_task`:

Copy code

python
@flow
async def orchestrator_flow(parameters_list):
    # Create tasks for all flows
    tasks = [
        asyncio.create_task(
            run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters=params,
                as_subflow=True
            )
        )
        for params in parameters_list
    ]
    
    # Wait for all tasks to complete
    results = await asyncio.gather(*tasks)
    return results

3. If you need more control over concurrency limits:

Copy code

python
from asyncio import Semaphore

@flow
async def orchestrator_flow(parameters_list, max_concurrent=4):
    sem = Semaphore(max_concurrent)
    
    async def run_with_semaphore(params):
        async with sem:  # This limits concurrent executions
            return await run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters=params,
                as_subflow=True
            )
    
    results = await asyncio.gather(*[
        run_with_semaphore(params)
        for params in parameters_list
    ])
    return results

Key benefits of these approaches: 1. Maintains subflow relationships (visible in UI) 2. Allows concurrent execution across workers 3. Parent flow can still track and wait for all subflows 4. Preserves flow run hierarchy and relationships With characteristic dry wit: Rather like having your cake and eating it too, except in this case, it's having your organizational hierarchy while still letting your workers multitask. A rare win for both bureaucracy and efficiency. Would you like me to: 1. Explain how to handle errors in concurrent subflows? 2. Show how to monitor the progress of concurrent subflows? 3. Demonstrate how to implement more sophisticated concurrency patterns?

Javier King

07/31/2025, 2:57 AM

@Marvin but how do i make sure that there's only a small number of flows spun up? i have 4 different workers, so i'd like 5 flows to be spun up. i don't want to hardcode this. if i async it, then there will be a whole bunch of flows that are scheduled / pending

Marvin

07/31/2025, 2:57 AM

✅ thought for 38.4 seconds

Marvin

07/31/2025, 2:58 AM

You can control this using tag-based concurrency limits! Here's how: 1. First, create a concurrency limit for your flows:

Copy code

bash
prefect concurrency-limit create binana-flows 4  # matches your worker count

2. Then modify your flow code to use this limit:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment

@flow
async def orchestrator_flow(parameters_list):
    # Add the tag that has the concurrency limit
    flow_runs = []
    for params in parameters_list:
        flow_run = await run_deployment(
            name="binanapocket/Binana-a-Pocket",
            parameters=params,
            as_subflow=True,
            tags=["binana-flows"]  # This tag will enforce the concurrency limit
        )
        flow_runs.append(flow_run)
    
    return flow_runs

This approach: - Limits the number of concurrent flows to match your worker capacity - Maintains the subflow relationship (

as_subflow=True

) - Automatically queues additional flows until capacity becomes available - Doesn't require hardcoding the concurrency limit in your code You can also set this up programmatically:

Copy code

python
from prefect.client import get_client
from prefect.concurrency import create_concurrency_limit

@flow
async def setup_and_run(parameters_list):
    # Create or update the concurrency limit
    async with get_client() as client:
        await create_concurrency_limit(
            tag="binana-flows",
            concurrency_limit=4  # match your worker count
        )
    
    # Now run your flows with the limit
    return await orchestrator_flow(parameters_list)

With dry wit: Rather like having a bouncer at a club who knows exactly how many people can fit on the dance floor. The rest can queue outside until someone leaves, preventing the chaos of everyone trying to dance at once. Would you like me to: 1. Show how to dynamically adjust the concurrency limit based on available workers? 2. Explain how to monitor active slots and queue status? 3. Demonstrate error handling for queued flows?

Javier King

07/31/2025, 3:03 AM

@Marvin

Copy code

for pid, pocket_n in needed_pockets:
        #if pocket_n == 15 and pid == "8fca":  # TEMP: Only run for one protein pocket
        flow_run = asyncio.create_task(
            run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
                as_subflow=True
            )
        )
        flow_runs.append(flow_run)

how's this?

Marvin

07/31/2025, 3:03 AM

✅ thought for 60.9 seconds

Marvin

07/31/2025, 3:04 AM

The code you've shown will create tasks for all flows immediately, but it's missing a crucial part - you need to actually await the tasks. Here's how to complete the pattern:

Copy code

python
from prefect import flow
from prefect.deployments import run_deployment
import asyncio

@flow
async def orchestrator_flow(uid, needed_pockets):
    flow_runs = []
    
    # Create tasks for all flows
    for pid, pocket_n in needed_pockets:
        flow_run = asyncio.create_task(
            run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
                as_subflow=True,
                tags=["binana-flows"]  # Add this if you want to use concurrency limits
            )
        )
        flow_runs.append(flow_run)
    
    # Wait for all tasks to complete
    results = await asyncio.gather(*flow_runs)
    return results

However, this approach will still: 1. Create all tasks immediately 2. Try to run all flows concurrently 3. Potentially overwhelm your workers if there are many pockets If you want to limit concurrency to match your worker count (4 workers), you might want to use a semaphore:

Copy code

python
@flow
async def orchestrator_flow(uid, needed_pockets):
    sem = asyncio.Semaphore(4)  # Match your worker count
    flow_runs = []
    
    async def run_with_limit(pid, pocket_n):
        async with sem:  # This ensures only 4 runs at a time
            return await run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
                as_subflow=True
            )
    
    # Create tasks with semaphore control
    for pid, pocket_n in needed_pockets:
        flow_run = asyncio.create_task(run_with_limit(pid, pocket_n))
        flow_runs.append(flow_run)
    
    # Wait for all tasks to complete
    results = await asyncio.gather(*flow_runs)
    return results

Or use the tag-based concurrency limit approach I mentioned earlier:

Copy code

python
@flow
async def orchestrator_flow(uid, needed_pockets):
    flow_runs = []
    
    # Create tasks for all flows
    for pid, pocket_n in needed_pockets:
        flow_run = asyncio.create_task(
            run_deployment(
                name="binanapocket/Binana-a-Pocket",
                parameters={"uid": uid, "pid": pid, "pocket_N": pocket_n},
                as_subflow=True,
                tags=["binana-flows"]  # This tag should have a concurrency limit set
            )
        )
        flow_runs.append(flow_run)
    
    # Wait for all tasks to complete
    results = await asyncio.gather(*flow_runs)
    return results

With dry wit: Your original code is like creating a bunch of concert tickets but not actually going to the concert. The

await

is what gets you through the door, and the semaphore/concurrency limit is what prevents the mosh pit from getting dangerously overcrowded. Would you like me to: 1. Explain how to monitor the progress of these concurrent flows? 2. Show how to handle errors if any of the flows fail? 3. Demonstrate how to implement more sophisticated queuing strategies?

Open in Slack

Previous Next