Miguel Moncada
09/09/2025, 2:34 PMMarvin
09/09/2025, 2:34 PMMarvin
09/09/2025, 2:42 PMflow.from_source(...)
(cloning a repo at runtime) or shipping code inside a container image?
- Does your flow access AWS resources (S3, RDS, etc.) that would require credentials in env vars? If so, how are those set?
Immediate diagnostics
1) Check Provider Logs in the UI
- Open a failed flow run in the Prefect UI
- Go to the Infrastructure/Provider logs panel
- Scroll to the earliest error line; that will usually tell you exactly why the container exited 1 (e.g., ModuleNotFoundError, git: not found, permission denied, etc.)
- If present, click the “View in AWS” link to open the CloudWatch log stream
Docs:
- ECS on Prefect (3.x)
2) Common causes for exit code 1 on managed ECS (and fixes)
- Missing dependencies in the image
- Symptom: ImportError/ModuleNotFoundError in provider logs
- Fix: Use a custom image that installs your requirements, or ensure your deployment installs them at runtime
- Missing git in the image when using from_source
- Symptom: “git: not found” or clone failures in provider logs
- Fix: Use a base image with git installed, e.g. extend Prefect image and add git
- Example Dockerfile:
FROM prefecthq/prefect:3-python3.11
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
- Private or unreachable image
- Managed execution can pull public images. If your image is private or not reachable, the task may fail before your app starts
- Fix: Make the image public or switch to the default Prefect image while testing
- Missing runtime environment variables
- Symptom: Your app exits when trying to read missing creds/config
- Fix: Add env vars on the Deployment (they propagate to the task container)
- Insufficient memory/CPU
- Symptom: Sometimes you’ll see OOM-like messages in provider logs (true OOM often exits 137; generic Killed/exceptions may exit 1)
- Fix: In your Managed Work Pool, increase CPU/Memory defaults; also ensure the Deployment doesn’t override to a lower value
- Blog: More memory, more problems
- Incorrect command/entrypoint override
- Symptom: “python: can’t open file …” or usage messages
- Fix: Revert to the default managed command, or verify your override runs your flow correctly
3) Why runs show Late then Pending then Crash
- Managed compute scales from zero; cold starts can cause Late state before infra is ready
- Pending indicates we’re launching the ECS task; if the container exits immediately with code 1, the run then moves to Crashed
- Once we fix the container exit cause, the Late/Pending sequence will still appear sometimes with cold start, but the run will proceed to Running
4) Verify your Managed Work Pool settings
- Image: default Prefect image vs custom (confirm it’s public and has git/deps if using from_source)
- CPU/Memory: bump to something safe (e.g., 1 vCPU / 2–4 GB) while debugging
- Environment variables: add anything your code needs (AWS creds, API URLs, etc.)
5) If your flow uses AWS resources
- Managed compute can’t assume your IAM roles; you must provide credentials via env vars or another method your code supports (e.g., access keys in env)
- Check the provider logs for AccessDenied or NoCredentials errors
6) Minimal repro testMarvin
09/09/2025, 2:42 PMfrom prefect import flow, get_run_logger
@flow
def smoke_test():
logger = get_run_logger()
<http://logger.info|logger.info>("Hello from managed ECS!")
if __name__ == "__main__":
smoke_test()
- If this succeeds, infra is healthy and the issue is likely your image/dependencies/env
- If this also fails, share the provider-log snippet and we’ll dig deeper
Helpful references
- Prefect + AWS ECS guide (3.x)
- Work pools overview blog
- Memory/CPU tuning blog
If you can share:
- A provider-log snippet from one failed run (first error)
- Your work pool’s image, CPU/Memory, and whether you’re using from_source
- Whether your repo is private and whether the image includes git
…I can pinpoint the fix quickly.Miguel Moncada
09/09/2025, 3:16 PMJake Kaplan
09/09/2025, 3:16 PMMiguel Moncada
09/09/2025, 3:17 PM0d373dc9-4403-4748-abdb-a57870ce39cb
Miguel Moncada
09/09/2025, 3:17 PMMiguel Moncada
09/09/2025, 3:17 PMMiguel Moncada
09/09/2025, 3:18 PMJake Kaplan
09/09/2025, 3:23 PMpip_packages
field because install errors aren't surfaced into your prefect logs.
ERROR: Invalid requirement: '@'
Jake Kaplan
09/09/2025, 3:23 PMMiguel Moncada
09/09/2025, 3:24 PMMiguel Moncada
09/09/2025, 3:24 PMMiguel Moncada
09/09/2025, 3:24 PMJake Kaplan
09/09/2025, 3:25 PMMiguel Moncada
09/09/2025, 3:25 PMMiguel Moncada
09/09/2025, 3:25 PMJake Kaplan
09/09/2025, 3:28 PMJake Kaplan
09/09/2025, 3:29 PMMiguel Moncada
09/09/2025, 3:29 PMJake Kaplan
09/09/2025, 3:31 PMMiguel Moncada
09/09/2025, 3:34 PMimport os
import logging
import argparse
from dataflows.deployments._constants import (
FLOW_DEPLOYMENT_DICT,
DEPLOYMENT_FILE_FUNC_DICT,
)
def get_file_name_file_extension(file_path: str) -> tuple:
"""Function to extract the file name and extension from a file URI
Args:
file_uri (str): URL pointing to the file
Returns:
tuple: file name and extension
"""
file_name_with_ext = os.path.basename(file_path)
file_name, file_ext = os.path.splitext(file_name_with_ext)
return (file_name, file_ext)
def get_deploy_function(file_name: str) -> callable:
"""Function to get the deployment function for a flow
Args:
flow (str): flow file name without extension
Returns:
callable: deployment function
"""
deploy_function = FLOW_DEPLOYMENT_DICT.get(file_name)
if not deploy_function:
deploy_function = DEPLOYMENT_FILE_FUNC_DICT.get(file_name)
return deploy_function
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
"-f",
"--file_path",
help="File path",
required=True,
)
args = parser.parse_args()
logger = logging.getLogger()
logger.setLevel(<http://logging.INFO|logging.INFO>)
file_name, _ = get_file_name_file_extension(args.file_path)
deploy_function = get_deploy_function(file_name)
if deploy_function:
<http://logger.info|logger.info>(f"Deploying {file_name} using {deploy_function.__name__}")
deploy_function()
<http://logger.info|logger.info>(f"{file_name} has been deployed!")
else:
<http://logger.info|logger.info>(
f"No deployment function found for {file_name},"
"skipping..."
)
Miguel Moncada
09/09/2025, 3:35 PMMiguel Moncada
09/09/2025, 3:36 PMMiguel Moncada
09/09/2025, 3:43 PMentrypoint_type=EntrypointType.MODULE_PATH
got me to the next step, now it's having a hard time finding a custom block that I had createdMiguel Moncada
09/09/2025, 3:43 PMFinished in state Failed('Flow run encountered an exception. ValueError: Unable to find block document named gofit-credentials for block type secret')
Jake Kaplan
09/09/2025, 4:42 PMMiguel Moncada
09/10/2025, 6:13 AM