Hello community! All of my deployment executions a...
# ask-community
b
Hello community! All of my deployment executions are running very late and I'm not sure why.... I am running prefect self-hosted with Docker Compose with a Worker service listening deployment executions for GCP Cloud Run work pool. My deployments can be late upwards to 4 hours in some cases. My worker service is started with this command in my docker compose file:
Copy code
entrypoint:
      - /opt/prefect/entrypoint.sh
      - prefect
      - worker
      - start
      - '--pool=my-work-pool'
      - '--with-healthcheck'
      - '--name=${DEFAULT_WORKER_NAME}'
      - '--limit=${DEFAULT_POOL_LIMIT}'
n
Hello community! All of my deployment executions are running very late and I'm not sure why....
I am running prefect self-hosted with Docker Compose with a Worker service listening deployment executions for GCP Cloud Run work pool.
My deployments can be late upwards to 4 hours in some cases.
My worker service is started with this command in my docker compose file:
```entrypoint:
- /opt/prefect/entrypoint.sh
- prefect
- worker
- start
- '--pool=my-work-pool'
- '--with-healthcheck'
- '--name=${DEFAULT_WORKER_NAME}'
- '--limit=${DEFAULT_POOL_LIMIT}'```
hi @Ben this is often something to do with how you're running the worker or concurrency limits not sure there's enough info to know for sure from this but • why are you using
entrypoint.sh
here? what's in there? • you may want to ask @Marvin
m
thought for 400.3 seconds
Thanks for the details — 4 hours “late” usually points to a worker/pool configuration issue, a backlog caused by concurrency limits, or the worker not actually polling the pool consistently. A few targeted checks to run first - Verify the pool type, status, and limits - Pool paused or concurrency set too low/zero will cause long backlogs. - Run:
Copy code
prefect work-pool inspect my-work-pool
    prefect work-queue ls
Confirm: - type is
cloud-run
- pool is not paused - work-pool and any work-queues do not have an unexpectedly low or zero concurrency limit - Docs: Work pools, Global concurrency limits, Concurrency limits API - Check the worker flags and logs - Your command looks generally fine; the worker will infer the correct type from the pool:
Copy code
prefect worker start --pool my-work-pool --name ${DEFAULT_WORKER_NAME} --limit ${DEFAULT_POOL_LIMIT} --with-healthcheck
- A very low
--limit
with long-running flows will create a backlog that shows up as “late.” What is
${DEFAULT_POOL_LIMIT}
and how long do typical runs take? - Temporarily increase logging to debug to see poll/submission behavior:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG prefect worker start --pool my-work-pool ...
- Docs: Workers - Validate the work-queue targeting - If your deployments specify a work queue but the worker is not polling it (or vice versa), nothing will get picked up until another worker does. - If you are using queues, start the worker with the matching
--work-queue
or remove queue scoping. - Confirm Cloud Run credentials/quotas - The worker (running in Docker Compose) must have permissions to create/run Cloud Run jobs in the target project/region. Permission/quotas errors will appear in the worker logs and can cause repeated backoffs. - If you are near quota for concurrent jobs per region, submissions may be throttled and runs will appear late. - Guide: GCP worker guide - Healthcheck usage in Docker Compose -
--with-healthcheck
starts the Prefect health endpoint; for Docker to act on it, add a Compose healthcheck that hits the endpoint and expose the port. Example:
Copy code
healthcheck:
      test: ["CMD-SHELL", "wget -qO- <http://localhost:8080/health> || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
    ports:
      - "8080:8080"
- Docs: Worker healthchecks - Double-check time sync and API config - Significant clock skew between your Prefect server, database, and worker host can make runs show “late.” - Ensure the worker’s
PREFECT_API_URL
points at your self-hosted API and that it’s reachable from the worker container. About your entrypoint.sh - What’s inside
/opt/prefect/entrypoint.sh
? If it doesn’t
exec "$@"
as the last line, the Prefect process may not be PID 1, signal handling can break, and restarts/healthchecks may not behave, causing downtime gaps where runs go late. - A safe minimal entrypoint:
Copy code
#!/usr/bin/env sh
  set -euo pipefail
  exec "$@"
Helpful next details to share so we can pinpoint the cause - Prefect version (exact) and confirmation you’re on Prefect 3.x - Output of:
Copy code
prefect work-pool inspect my-work-pool
  prefect work-queue ls
- Your worker logs around the time a run should start (any backoff, permission, or quota messages) - The value of
${DEFAULT_POOL_LIMIT}
and typical flow run duration - Whether deployments target a specific work queue - Contents of
/opt/prefect/entrypoint.sh
If everything above looks good and you still see hours of lateness, the next most common culprits are a low worker
--limit
relative to backlog, queue scoping mismatches, or Cloud Run submission failures/quotas. Happy to triage the logs with you.
b
Hey thanks @Nate, while we wait for Marvin.... I am using a Docker compose template from Coolify, and this entrypoint.sh seems to be available within the official prefect 3 docker image. I did notice that
DEFAULT_POOL_LIMIT
is set to
1
- does that affect the concurrency?
n
yes > this entrypoint.sh seems to be available within the official prefect 3 docker image im just curious why you're using it here to run a worker
DEFAULT_POOL_LIMIT
is not a prefect setting,
--limit
is a worker-specific (client-side) limit on the concurrent flow runs to start (its not a limit on the work pool) so it would appear coolify or whoever made the template is confused about the
--limit
flag
besides the client-side
limit
flag, you can set (server-side) concurrency on the work pool, a queue within that pool, or the deployment itself
b
My work pool had concurrency set to
unlimited
but this worker start command is using
--limit=1
.
If I understand correctly, this client side limit is causing a bottleneck if I have long running jobs?
n
could be!
yea
--limit 1
means that the
prefect worker start
will only be able to submit one scheduled flow run for execution at a time
b
Interesting, well I have increased it... lets see what happens
Hey @Nate - based on a couple tests, it seems adjusting the
--limit
flag has done the trick. Thanks for the help!
n
catjam