gee
09/17/2025, 8:56 AMMarvin
09/17/2025, 8:56 AMMarvin
09/17/2025, 9:02 AMPREFECT_RUNNER_POLL_FREQUENCY to 1s (or lower if you can tolerate the extra API calls).
- Start your served process with:
PREFECT_RUNNER_POLL_FREQUENCY=1 python serve.py
- Expect p50 near ~0.5–1.5s, p95 still subject to network/variance.
- If you use workers instead of serve (not required for serve), the equivalent knobs are:
- PREFECT_WORKER_QUERY_SECONDS (default 10) → lower to 1s
- PREFECT_WORKER_PREFETCH_SECONDS (default 10) → set to 0–1 if you don’t want prefetch behavior to affect pickup
- Example:
PREFECT_WORKER_QUERY_SECONDS=1 PREFECT_WORKER_PREFETCH_SECONDS=0 prefect worker start -p <your-pool>
A few practical notes
- flow.serve keeps your code “warm” which is good — the main lever for latency here is the poll interval.
- If you need consistent sub-second starts, Prefect (as an orchestrator) isn’t ideal as the first hop in the request path. Common pattern for “real-time inference”:
- Put a thin web service in front to accept requests and run the hot inference directly.
- Use Prefect to orchestrate background/async work (fanout, enrichment, post-processing, retries) via events or API-triggered runs.
- Network proximity can matter a bit: keep your served process close to Prefect Cloud (US-based today) to shave off some variability.
Helpful references
- Runner and worker settings reference (poll, heartbeat, prefetch): Prefect 3 Settings reference
- flow.serve overview and patterns: Schedule your code quickly with flow.serve
- Worker concepts and health: Workers in Prefect 3
A couple questions to fine-tune recommendations
- How are you triggering the runs (UI “Run”, API, webhook, another flow)?
- Where is the flow.serve() process running (same region as your users / close to Prefect Cloud)?
- What P95/P99 start-time target do you need?
If you share that, I can suggest an exact setup and the lowest safe poll frequency for your traffic profile.gee
09/17/2025, 9:12 AMMarvin
09/17/2025, 9:12 AMMarvin
09/17/2025, 9:13 AMPREFECT_RUNNER_POLL_FREQUENCY=1 # default is 10
- Start your serve process like:
PREFECT_RUNNER_POLL_FREQUENCY=1 python your_script.py
- With a 1s poll, the pickup time becomes roughly uniform(0–1s) plus network/overhead. In practice, p95 is typically ~1.5–2.0s in a stable environment.
- Keep the execution environment hot
- Run flow.serve() in a long-lived process (no cold container/pod starts).
- Avoid infrastructure spin-up (e.g., don’t put serve behind a “start a pod per run” model).
- Co-locate the serving process in a region with low RTT to Prefect Cloud.
- If you ever switch to workers (not required for serve), mirror the tuning:
- Lower query interval:
PREFECT_WORKER_QUERY_SECONDS=1
- Disable or reduce prefetch:
PREFECT_WORKER_PREFETCH_SECONDS=0
- Start the worker:
PREFECT_WORKER_QUERY_SECONDS=1 PREFECT_WORKER_PREFETCH_SECONDS=0 prefect worker start -p <your-pool>
What to expect
- With PREFECT_RUNNER_POLL_FREQUENCY=1, you should see p50 ~0.5–1.0s and p95 landing close to 1.5–2.0s depending on network jitter and run startup overhead.
- If you need even tighter p95, you’ll need extremely low network latency and minimal run initialization (imports, logging, credentials, etc.). Also keep the number of served deployments low or ensure the host is not CPU-constrained.
References
- Settings for runner/worker poll intervals and heartbeat: Prefect 3 Settings reference
- flow.serve overview and patterns: Schedule your code quickly with flow.serve
- Worker concepts/health: Workers in Prefect 3
If you can share:
- Where your served process runs (region, environment)
- How you trigger each run
- Any heavy initialization your flow does before the first task
…I can suggest a tighter config or pinpoint any remaining sources of >2s variance.gee
09/17/2025, 9:22 AMMarvin
09/17/2025, 9:22 AMMarvin
09/17/2025, 9:23 AMflow.serve() (served flows use a local runner, not a worker).
- If you were using workers, disabling prefetch is fine for real-time API-triggered runs; it won’t slow you down.
Your setup and what to do
- Served processes in EU on ECS, API-triggered runs, trivial flow body:
- The dominant factor is the serve-runner’s polling interval. Set:
PREFECT_RUNNER_POLL_FREQUENCY=1
Then run your serve process as a long-lived ECS task:
PREFECT_RUNNER_POLL_FREQUENCY=1 python your_script.py
- With 1s polling and low RTT, p95 ~2s is realistic. Your previous 3–13s lines up with the default 10s poll.
Other small levers to stay under 2s p95
- Keep the ECS task hot and not CPU-throttled (reserve CPU/memory so Python imports/logging aren’t delayed).
- Minimize network RTT to Prefect Cloud (EU → US adds ~100–150ms per round trip). If you can run the serve process closer to Prefect Cloud’s region, you’ll reduce variance.
- Avoid heavy logging on start and avoid any external calls before the first task log.
If you decide to try workers instead of serve
- Prefetch still won’t help immediate triggers. Use faster polling:
PREFECT_WORKER_QUERY_SECONDS=1 PREFECT_WORKER_PREFETCH_SECONDS=0 prefect worker start -p <your-pool>
- This gets similar p95 behavior to the serve-runner with 1s polling.
References
- Settings reference (runner/worker poll, heartbeat, prefetch): Prefect 3 Settings reference
- flow.serve overview: Schedule your code quickly with flow.serve
- Worker concepts: Workers in Prefect 3
If you flip the runner poll to 1s and still see p95 > 2s, share a quick timing breakdown (API request → run created → first flow log) and I’ll help pinpoint any remaining overhead.gee
09/17/2025, 9:35 AMMarvin
09/17/2025, 9:35 AMMarvin
09/17/2025, 9:39 AM.from_source('<https://github.com/...>')), so the worker imports code already on disk.
- This avoids per-run git clone and avoids cold infra startup — the worker just spawns a subprocess to run the flow.
Step-by-step
1) Create a process work pool (one-time)
prefect work-pool create "realtime-pool" --type process
2) Start a worker with fast polling and an isolated queue
PREFECT_WORKER_QUERY_SECONDS=1 \
PREFECT_WORKER_PREFETCH_SECONDS=0 \
prefect worker start -p realtime-pool -q realtime
3) Deploy your flow to that pool/queue
- Simple example using the flow object directly:
from prefect import flow
@flow
def my_flow():
print("hello from realtime worker")
if __name__ == "__main__":
# Create/update a deployment that routes to your worker's pool/queue
my_flow.deploy(
name="realtime-deployment",
work_pool_name="realtime-pool",
work_queue_name="realtime", # must match the worker
# no 'image' and no 'from_source' GitHub — code lives with the worker
)
- If you prefer being explicit about the entrypoint resolved on disk (still local, no GitHub):
from prefect import flow, get_run_logger
from prefect.flows import Flow
@flow
def my_flow():
get_run_logger().info("hello from realtime worker")
if __name__ == "__main__":
Flow.from_source(
".", # local filesystem (baked/mounted in the worker)
"app/flows.py:my_flow", # path:func your worker can import at runtime
).deploy(
name="realtime-deployment",
work_pool_name="realtime-pool",
work_queue_name="realtime",
)
Notes and best practices for 2s p95
- Keep the worker hot: run it as a long-lived ECS service/container with CPU/mem reserved.
- Co-locate the worker as close as possible (network-wise) to Prefect Cloud to reduce RTT.
- Using ECS “per-run” tasks (ECS work pool) will add several seconds of cold-start — not ideal for 2s p95. A long-lived Process worker in ECS is the right choice here.
- With PREFECT_WORKER_QUERY_SECONDS=1, expect p50 well under a second and p95 around ~1.5–2s for trivial flows. This aligns with your goal.
Docs and references
- Work pools and workers: Workers in Prefect 3
- Settings (query_seconds, prefetch_seconds): Prefect 3 Settings reference
- Deploy CLI (alternative to code-based deploys): Deploy flows with prefect deploy
If you want me to translate this to your exact repo layout (entrypoint paths, Dockerfile/Task Definition for ECS, etc.), share your file structure and how you build the ECS image and I’ll tailor the snippets.gee
09/17/2025, 9:55 AM.serve is it normal?
Results for .serve
Min: 1.634s | Avg: 2.272s | Max: 2.944s
Results for work pool based solution
Min: 2.419s | Avg: 3.248s | Max: 4.946s
Why is this? Any way to improve it?Marvin
09/17/2025, 9:55 AMMarvin
09/17/2025, 9:56 AMPREFECT_WORKER_QUERY_SECONDS=0.5 PREFECT_WORKER_PREFETCH_SECONDS=0 prefect worker start -p realtime-pool -q realtime
This alone can shave ~0.25–0.5s off p95.
- Ensure no per-run setup work
- Use a Process work pool and mount/bake the code into the long-lived ECS container (no git clone, no pip install in job template).
- If you’re using Flow.from_source(".", "path:func"), confirm your process pool’s base job template does NOT include pip_install_requirements or custom shell steps.
- Keep logging minimal at startup and avoid external calls in the flow’s first lines.
- Give the worker CPU headroom
- Reserve CPU/memory in ECS so the child Python process starts quickly (burstable throttling adds 100s of ms).
- Pin the container to a performant instance class.
- Keep the queue isolated
- Use a dedicated work queue for these runs so your worker isn’t juggling other jobs.
Reality check vs serve
- Your numbers are consistent with expectations:
- serve: Min ~1.6s, Avg ~2.3s, Max ~2.9s
- worker: Min ~2.4s, Avg ~3.2s, Max ~4.9s
- Serve is usually the best option for p95 ≤ 2s. If that’s the hard SLO, I’d stick with serve unless you need worker features (per-run isolation, different infra types, etc.).
If you want, paste your work pool base job template (or confirm you’re on the default Process template) and your deployment snippet; I can help strip any unnecessary steps and see if we can squeeze another ~0.5–1.0s out of the worker path.
References
- Workers and work pools: Workers in Prefect 3
- Settings (query_seconds, prefetch_seconds): Prefect 3 Settings reference
- flow.serve overview: Schedule your code quickly with flow.servegee
09/17/2025, 10:11 AMFlow.from_source("."), "path.py:func") is required though.
Let's consider the .serve() approach. Is it possible to containerise it and run multiple container? How does prefect allocate jobs to different server workers since there's not work pool?Marvin
09/17/2025, 10:11 AMMarvin
09/17/2025, 10:12 AMfrom prefect.flows import Flow
# Keep your entrypoint explicit; no GitHub pulls
flow = Flow.from_source(".", "path.py:func")
if __name__ == "__main__":
flow.serve(
name="realtime-serve", # shared name across all replicas
limit=1, # 1 run at a time per container (tune as needed)
global_limit=5, # cap across all replicas (set to N or N*limit)
pause_on_shutdown=False, # don’t pause the deployment if one replica exits
# triggers=[...], # optional: add a webhook trigger or others
# tags=["realtime"], # optional
)
Dockerfile sketch
FROM python:3.11-slim
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
# Add prefect and your deps
RUN pip install --no-cache-dir prefect
COPY . /app
# Prefect Cloud credentials/URL provided via environment in ECS
# Reduce poll interval for low-latency pickup
ENV PREFECT_RUNNER_POLL_FREQUENCY=1
CMD ["python", "serve_entrypoint.py"]
ECS deployment
- Run an ECS Service with desired count = number of replicas you want.
- Ensure CPU/memory reservations so the process stays hot.
- Provide PREFECT_API_URL and PREFECT_API_KEY env vars.
- Keep the task running; don’t spin per-run tasks.
Triggering runs
- After the first replica calls serve, Prefect Cloud shows a “served deployment” named realtime-serve. You can:
- Trigger from the UI
- Call the Prefect API / run_deployment endpoint
- Add a webhook trigger (UI or via the triggers= parameter)
Docs: Create a webhook
Routing/throughput tips
- Set PREFECT_RUNNER_POLL_FREQUENCY=1 (or 0.5 if you can tolerate the extra API calls) in every replica.
- Use limit=1 per replica for predictable scaling; set global_limit to (#replicas) for 1:1 concurrency.
- There’s no work pool in this model; all replicas with the same name compete to claim the same served deployment’s runs.
- If you need hard partitioning, run multiple served deployments with different names and route to the right one in your trigger logic.
References
- Flow serve overview: Schedule your code quickly with flow.serve
- Settings (runner poll): Prefect 3 Settings reference
If you share your ECS task/Service config, I can sanity-check environment vars and health checks.