< Marvin> all my flows are late enter in Pending state and t Prefect Community #ask-community

<@ULVA73B9P> all my flows are late, enter in Pendi...

Miguel Moncada

09/09/2025, 2:34 PM

@Marvin all my flows are late, enter in Pending state and then crash with "Flow run infrastructure exited with non-zero status code: Essential container in task exited (Error Code: 1)", I'm using the cloud's managed work pool and execution infrastructure

✅ 1

Marvin

09/09/2025, 2:34 PM

✅ thought for 439.4 seconds

Marvin

09/09/2025, 2:42 PM

Thanks for the detail — that error string “Essential container in task exited (Error Code: 1)” is coming from AWS ECS/Fargate. Since you’re using Prefect Cloud’s Managed Work Pool/Managed Execution, here’s how to quickly isolate the root cause and get you unblocked. First, a couple of clarifying questions so I can tailor the fix: - Are you using the default Prefect image or a custom image in the managed work pool? - If custom, is the image public and does it include git and your Python dependencies? - Is your deployment using

flow.from_source(...)

(cloning a repo at runtime) or shipping code inside a container image? - Does your flow access AWS resources (S3, RDS, etc.) that would require credentials in env vars? If so, how are those set? Immediate diagnostics 1) Check Provider Logs in the UI - Open a failed flow run in the Prefect UI - Go to the Infrastructure/Provider logs panel - Scroll to the earliest error line; that will usually tell you exactly why the container exited 1 (e.g., ModuleNotFoundError, git: not found, permission denied, etc.) - If present, click the “View in AWS” link to open the CloudWatch log stream Docs: - ECS on Prefect (3.x) 2) Common causes for exit code 1 on managed ECS (and fixes) - Missing dependencies in the image - Symptom: ImportError/ModuleNotFoundError in provider logs - Fix: Use a custom image that installs your requirements, or ensure your deployment installs them at runtime - Missing git in the image when using from_source - Symptom: “git: not found” or clone failures in provider logs - Fix: Use a base image with git installed, e.g. extend Prefect image and add git - Example Dockerfile:

Copy code

FROM prefecthq/prefect:3-python3.11
  RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
  COPY requirements.txt .
  RUN pip install -r requirements.txt

- Private or unreachable image - Managed execution can pull public images. If your image is private or not reachable, the task may fail before your app starts - Fix: Make the image public or switch to the default Prefect image while testing - Missing runtime environment variables - Symptom: Your app exits when trying to read missing creds/config - Fix: Add env vars on the Deployment (they propagate to the task container) - Insufficient memory/CPU - Symptom: Sometimes you’ll see OOM-like messages in provider logs (true OOM often exits 137; generic Killed/exceptions may exit 1) - Fix: In your Managed Work Pool, increase CPU/Memory defaults; also ensure the Deployment doesn’t override to a lower value - Blog: More memory, more problems - Incorrect command/entrypoint override - Symptom: “python: can’t open file …” or usage messages - Fix: Revert to the default managed command, or verify your override runs your flow correctly 3) Why runs show Late then Pending then Crash - Managed compute scales from zero; cold starts can cause Late state before infra is ready - Pending indicates we’re launching the ECS task; if the container exits immediately with code 1, the run then moves to Crashed - Once we fix the container exit cause, the Late/Pending sequence will still appear sometimes with cold start, but the run will proceed to Running 4) Verify your Managed Work Pool settings - Image: default Prefect image vs custom (confirm it’s public and has git/deps if using from_source) - CPU/Memory: bump to something safe (e.g., 1 vCPU / 2–4 GB) while debugging - Environment variables: add anything your code needs (AWS creds, API URLs, etc.) 5) If your flow uses AWS resources - Managed compute can’t assume your IAM roles; you must provide credentials via env vars or another method your code supports (e.g., access keys in env) - Check the provider logs for AccessDenied or NoCredentials errors 6) Minimal repro test

Marvin

09/09/2025, 2:42 PM

- Create a tiny flow, deploy from source with the default Prefect image, and run it in the same work pool

Copy code

from prefect import flow, get_run_logger

@flow
def smoke_test():
    logger = get_run_logger()
    <http://logger.info|logger.info>("Hello from managed ECS!")

if __name__ == "__main__":
    smoke_test()

- If this succeeds, infra is healthy and the issue is likely your image/dependencies/env - If this also fails, share the provider-log snippet and we’ll dig deeper Helpful references - Prefect + AWS ECS guide (3.x) - Work pools overview blog - Memory/CPU tuning blog If you can share: - A provider-log snippet from one failed run (first error) - Your work pool’s image, CPU/Memory, and whether you’re using from_source - Whether your repo is private and whether the image includes git …I can pinpoint the fix quickly.

Miguel Moncada

09/09/2025, 3:16 PM

I could use some human help if anyone can shed some light here 🙇 thanks in advance

Jake Kaplan

09/09/2025, 3:16 PM

Hey, whats's your account id?

Miguel Moncada

09/09/2025, 3:17 PM

0d373dc9-4403-4748-abdb-a57870ce39cb

Miguel Moncada

09/09/2025, 3:17 PM

thanks Jake!

Miguel Moncada

09/09/2025, 3:17 PM

It's probably something stupid, but since I cannot see anything from the logs it's difficult to guess

Miguel Moncada

09/09/2025, 3:18 PM

I tried to mimic the behaviour by running a docker container with the same prefect image, then doing my pip install which targets a private github repo using a pat, but I see no errors nor warnings

Jake Kaplan

09/09/2025, 3:23 PM

I think I see the issue. We're actually looking to deprecate the

pip_packages

field because install errors aren't surfaced into your prefect logs.

Copy code

ERROR: Invalid requirement: '@'

Jake Kaplan

09/09/2025, 3:23 PM

It looks like you maybe have some spaces in a pip package you're specifying from git? I'd definitely recommend using pull steps, you should have full logging visibility

Miguel Moncada

09/09/2025, 3:24 PM

I had tried with both the space and without any spaces

Miguel Moncada

09/09/2025, 3:24 PM

I just hit another run, perhaps now the error is different

Miguel Moncada

09/09/2025, 3:24 PM

My deployments logic is written with the Python SDK, do pull steps apply here too?

Jake Kaplan

09/09/2025, 3:25 PM

• https://docs.prefect.io/v3/how-to-guides/deployments/prefect-yaml#the-pull-action • https://github.com/zzstoatzz/prefect-pack/blob/4cfc53ae94c6bd65ff0eefcb1c5ce72b48446179/prefect.yaml#L69-L74

Jake Kaplan

09/09/2025, 3:25 PM

let me see if the error is different

Miguel Moncada

09/09/2025, 3:25 PM

Ah after changing that I now see an error

Miguel Moncada

09/09/2025, 3:25 PM

hadn't got this far earlier!

Jake Kaplan

09/09/2025, 3:28 PM

If I'm understanding right you're pip installing your code from github (as opposed to pulling the code down directly?)

Jake Kaplan

09/09/2025, 3:29 PM

You'll likely need to change your entrypoint from script -> module syntax

Miguel Moncada

09/09/2025, 3:29 PM

yes, I used to do this in another (paid) account, but running a custom image that already had the repo cloned tho

Jake Kaplan

09/09/2025, 3:31 PM

The pull step route I linked above would: • pull your code down from github • installs your dependencies • run your file as a script but if you're installing it you'll need to change your deployment's entrypoint to a module https://docs.prefect.io/v3/api-ref/python/prefect-types-entrypoint#entrypoint

Miguel Moncada

09/09/2025, 3:34 PM

Mmmm I believe none of this existed when I first set up this logic so I'd need to revisit again... I currently have a github action that checks on changed flow/deployments files and deploys them using a python script:

Copy code

import os
import logging
import argparse
from dataflows.deployments._constants import (
    FLOW_DEPLOYMENT_DICT,
    DEPLOYMENT_FILE_FUNC_DICT,
)


def get_file_name_file_extension(file_path: str) -> tuple:
    """Function to extract the file name and extension from a file URI
    Args:
        file_uri (str): URL pointing to the file
    Returns:
        tuple: file name and extension
    """
    file_name_with_ext = os.path.basename(file_path)
    file_name, file_ext = os.path.splitext(file_name_with_ext)
    return (file_name, file_ext)


def get_deploy_function(file_name: str) -> callable:
    """Function to get the deployment function for a flow

    Args:
        flow (str): flow file name without extension

    Returns:
        callable: deployment function
    """
    deploy_function = FLOW_DEPLOYMENT_DICT.get(file_name)

    if not deploy_function:
        deploy_function = DEPLOYMENT_FILE_FUNC_DICT.get(file_name)
    return deploy_function


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        "-f",
        "--file_path",
        help="File path",
        required=True,
    )

    args = parser.parse_args()
    logger = logging.getLogger()
    logger.setLevel(<http://logging.INFO|logging.INFO>)

    file_name, _ = get_file_name_file_extension(args.file_path)

    deploy_function = get_deploy_function(file_name)

    if deploy_function:
        <http://logger.info|logger.info>(f"Deploying {file_name} using {deploy_function.__name__}")
        deploy_function()
        <http://logger.info|logger.info>(f"{file_name} has been deployed!")
    else:
        <http://logger.info|logger.info>(
            f"No deployment function found for {file_name},"
            "skipping..."
        )

Miguel Moncada

09/09/2025, 3:35 PM

this is old, but works fine in our startup account so thought I'd use the same strategy for a personal project of mine - this is why I'd stick to python SDK for deploying flows instead of the YAML approach

Miguel Moncada

09/09/2025, 3:36 PM

if it's a matter of changing the entrypoint format and that's straightforward I'd stick to that - not sure how/where this is specified tho

Miguel Moncada

09/09/2025, 3:43 PM

ah indeed using

entrypoint_type=EntrypointType.MODULE_PATH

got me to the next step, now it's having a hard time finding a custom block that I had created

Miguel Moncada

09/09/2025, 3:43 PM

Copy code

Finished in state Failed('Flow run encountered an exception. ValueError: Unable to find block document named gofit-credentials for block type secret')

Jake Kaplan

09/09/2025, 4:42 PM

nice! Not sure how/where you're loading the block but it looks like it's expecting a secret block with that slug, but it looks like you have that as a custom block type

Miguel Moncada

09/10/2025, 6:13 AM

After switching the custom blocks for standard ones (not ideal but I can live with that) it worked just fine, the key was to change the entrypoint as you pointed out Jake, thanks a lot!!

🙌 1

2 Views

Open in Slack

Previous Next