Hi everyone! I'm having a bit of an issue using `r...
# ask-community
s
Hi everyone! I'm having a bit of an issue using
run_deployment
which I worry might be a bug and not user error. The summary is that I'm scheduling flow runs on NERSC via slurm, which means the flow is often in pending for many minutes before slurm allocates resources. The
timeout
parameter is kept to None, but it seems that if the poll then it errors. Ie in my parent flow logs I can see:
Copy code
PrefectHTTPStatusError("Server error '500 Internal Server Error' for url '<http://prefect.prefect-pipelines.production.svc.spin.nersc.org/api/flow_runs/9418ee41-b55c-40ee-a7cb-879df77a766a>'\nResponse: {'exception_message': 'Internal Server Error'}\nFor more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500>")
Which comes from
File "/usr/local/lib/python3.13/site-packages/prefect/deployments/flow_runs.py", line 203, in run_deployment
and
flow_run = await client.read_flow_run(flow_run_id)
When I wait a minute for the job to be scheduled, I can hit this endpoint perfectly:
Copy code
{
  "id": "9418ee41-b55c-40ee-a7cb-879df77a766a",
  "created": "2025-08-28T23:20:39.035659Z",
  "updated": "2025-08-28T23:20:43.784032Z",
  "name": "preprocess_/data/level=raw/runs/run_id=25_056_084/science_red.fits",
  "flow_id": "344eea82-0256-46b2-bdc2-615a21be48e6",
  "state_id": "0198f2fb-c55b-7ab9-aab6-971eb870d3ff",
  "deployment_id": "c29d351e-ee20-4267-93ee-d157b91f1d6b",
  "deployment_version": "104f2d07",
  "work_queue_id": "b0ac5ee7-ce2f-47ed-8f6d-7ddce9c0a1b0",
  "work_queue_name": "default",
...
}
So I feel like
run_deployment
shouldn't be raising an error in this instance. Any devs around to share their thoughts? Specifically, the code in `run_deployment`:
Copy code
with anyio.move_on_after(timeout):
        while True:
            flow_run = await client.read_flow_run(flow_run_id)
            flow_state = flow_run.state
            if flow_state and flow_state.is_final():
                return flow_run
            await anyio.sleep(poll_interval)
Uses timeout to wait for flow completion, but it does seem to assume relatively immediate flow registration