artur
10/01/2025, 3:01 PMprefect deploy is haginging on some of the flows without much indication why:
╭──────────────────────────────────────────────────────────────────────────────╮
│ Deploying nucleus_ingestion │
╰──────────────────────────────────────────────────────────────────────────────╯
{"name":"prefect.client","msg":"Connecting to API at <http://prefect-v3.data-dev.particlehealth.com/api/>","args":[],"levelname":"DEBUG","levelno":10,"pathname":"/home/prefect/.venv/lib/python3.11/site-packages/prefect/client/orchestration/__init__.py","filename":"__init__.py","module":"__init__","exc_info":null,"exc_text":null,"stack_info":null,"lineno":1098,"funcName":"__aenter__","created":1759261817.6602,"msecs":660.0,"relativeCreated":8926.210403442383,"thread":140149606460288,"threadName":"MainThread","processName":"MainProcess","process":29,"severity":"DEBUG"}
{"name":"prefect.client","msg":"Connecting to API at <http://prefect-v3.data-dev.particlehealth.com/api/>","args":[],"levelname":"DEBUG","levelno":10,"pathname":"/home/prefect/.venv/lib/python3.11/site-packages/prefect/client/orchestration/__init__.py","filename":"__init__.py","module":"__init__","exc_info":null,"exc_text":null,"stack_info":null,"lineno":1098,"funcName":"__aenter__","created":1759261817.6756892,"msecs":675.0,"relativeCreated":8941.699504852295,"thread":140149606460288,"threadName":"MainThread","processName":"MainProcess","process":29,"severity":"DEBUG"}
I know the api works because other flows deploy without issues; I haven’t made any changes to the flow itself in the upgrade PR I’m testing
I already have debug turned on; how can I get more information here?Marvin
10/01/2025, 3:01 PMMarvin
10/01/2025, 3:07 PMprefect deploy in 3.x, especially when only some flows are affected. The most common culprits are:
- Waiting on a hidden CLI prompt (non‑interactive environment)
- Import-time side effects in the flow’s module (network calls, DB/K8s client init, etc.)
- Packaging/scan of a very large project directory (or symlink loops) without a .prefectignore
Here’s how to get more signal and narrow it down.
Quick checks
- Force non-interactive mode in case it’s waiting for a prompt:
PREFECT_CLI_PROMPT=false prefect --no-prompt deploy
- Try deploying a single entrypoint directly (bypasses some discovery):
prefect --no-prompt deploy ./path/to/file.py:flow_func_name
Capture where it’s hanging
- Enable Python faulthandler to dump a traceback when it appears stuck:
1) Start the deploy with:
PYTHONFAULTHANDLER=1 prefect --no-prompt deploy
2) In another shell, send SIGUSR1 to the deploy process PID:
kill -USR1 <pid>
This will print a full stack trace to stderr so we can see exactly what it’s waiting on.
Turn up client/HTTP logging
- Create a minimal logging config to include `httpx`/`anyio` at DEBUG:
version: 1
disable_existing_loggers: False
loggers:
prefect:
level: DEBUG
handlers: [console]
propagate: False
httpx:
level: DEBUG
handlers: [console]
propagate: False
anyio:
level: DEBUG
handlers: [console]
propagate: False
handlers:
console:
class: logging.StreamHandler
formatter: basic
formatters:
basic:
format: "%(asctime)s %(levelname)s %(name)s - %(message)s"
Then run:
PREFECT_LOGGING_SETTINGS_PATH=./logging.yml prefect --no-prompt deploy
Rule out import-time side effects
- prefect deploy imports your flow module to discover the flow object. If your module does work at import time (connects to Postgres, K8s API, reads secrets, long computations), the CLI can appear to “hang”.
- Test the entrypoint import in the same environment:
python -c "import importlib; import time; t=time.time(); m=importlib.import_module('your.module.path'); print('Imported in', time.time()-t, 's'); print(hasattr(m, 'flow_func_name'))"
If this hangs, move side-effectful code behind if __name__ == '__main__': or into functions that are only called at run time, not at import.
Speed up/avoid packaging stalls
- If the project dir is large, add a .prefectignore to exclude heavy dirs and avoid scan/tar stalls:
.venv/
.git/
__pycache__/
node_modules/
data/
*.parquet
*.csv
- Also ensure there are no symlink loops in your project directory.
Optional: rule out HTTP/transport quirks
- In a few environments, HTTP/2 can cause odd client behavior. You can try:
PREFECT_API_ENABLE_HTTP2=false prefect --no-prompt deploy
A couple of helpful refs:
- Prefect 3: prefect.yaml and deployments
- Prefect 3 logging APIs
- 3.4.x release notes
If you can share:
- The exact command you’re running
- Whether this is in CI or an interactive shell
- The entrypoint of a flow that hangs
- The output from the faulthandler dump (or confirm whether importing the module hangs)
…I can help pinpoint which step is blocking and suggest a fix specific to your project.artur
10/01/2025, 3:20 PMMarvin
10/01/2025, 3:20 PMMarvin
10/01/2025, 3:26 PMPYTHONFAULTHANDLER=1 prefect --no-prompt deploy --prefect-file prefect/v3/pyproject.toml centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion &
echo $! # get PID
kill -USR1 <pid>
If you’re on Windows or cannot send USR1, use py-spy instead:
py-spy dump --pid <pid>
This gives us the exact line where it’s stuck.
- One more note: I don’t see a documented --prefect-file option in the CLI help for Prefect 3.x. It might be benign or ignored, but to rule out config-path issues, please try once from your project root without that option:
prefect --no-prompt deploy centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion
Now, the fastest way to isolate the step that’s blocking
Please run this minimal script in the same environment (same container/venv) as your CLI. It prints markers around the three phases that prefect deploy goes through: API client, module import, and deployment call.
import anyio
from prefect.client.orchestration import get_client
from prefect.flows import load_flow_from_entrypoint
ENTRYPOINT = "centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion"
WORK_POOL = "<your-work-pool-name>" # update
async def main():
print("A) opening client...")
async with get_client() as c:
# simple request to confirm networking is fine
await c.read_work_pools()
print("A) client OK")
print("B) importing flow from entrypoint...")
flow = load_flow_from_entrypoint(ENTRYPOINT)
print("B) flow import OK:", flow.name)
print("C) deploying (no build/push)...")
# Make sure to use an existing image if your work pool expects one
flow_id = flow.deploy(
name="debug-nucleus-ingestion",
work_pool_name=WORK_POOL,
build=False,
push=False,
print_next_steps=False,
)
print("C) deploy call returned:", flow_id)
anyio.run(main)
- If it hangs at A: it’s network/client-level. Try:
- PREFECT_API_ENABLE_HTTP2=false (disables HTTP/2)
- Confirm DNS/TLS with a quick curl to http://prefect-v3.data-dev.particlehealth.com/api/ from the same environment
- Check server logs during this attempt to see if a request hits the API
- If it hangs at B: it’s import-time side effects in your module. Common causes:
- DB connections, K8s client initialization, or long computations executed at import time
- Reading secrets or network calls during import
Move those into functions or if __name__ == "__main__": guards so importing the module is fast and side-effect free. You can also time a plain import:
python -X faulthandler -c "import time, importlib; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print(time.perf_counter()-t)"
- If it hangs at C but only when build/push are True in your normal deploy:
- It’s likely packaging or Docker build/push getting stuck. To confirm via CLI:
prefect --no-prompt deploy centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion --no-build --no-push
If that succeeds, look at your image build step or .prefectignore to reduce package size.
- Add a .prefectignore to the project root to avoid scanning huge directories:
.venv/
.git/
__pycache__/
node_modules/
data/
*.parquet
*.csv
- If you’re building an image in an environment without a Docker daemon or with a remote builder that’s unreachable, Docker will appear to “hang”.
Extra logging knobs
- Confirm Prefect logs are at DEBUG:
PREFECT_LOGGING_ROOT_LEVEL=DEBUG prefect --no-prompt deploy ...Marvin
10/01/2025, 3:26 PM--prefect-file once, or from the directory where your prefect.yaml resides. Then run prefect config view to confirm the API URL and profile are what you expect.
- Server compatibility: confirm your server is 3.4.20 (or compatible) and DB migrations are up. If a specific API call deadlocks on the server, you’ll see it in server logs at the time of the hang.
If you can share:
- Which letter (A/B/C) is the last printed in the debug script
- Whether --no-build --no-push makes the CLI version pass
- Whether a raw import of centrifuge.prefect.flows.nucleus_ingestion completes quickly
- Server logs around the time of the hang
…I can make a much more precise recommendation.artur
10/01/2025, 3:52 PM(.venv) ➜ proton git:(artur/prefect_3.4.19) ✗ python test.py
A) opening client...
A) client OK
B) importing flow from entrypoint...
when I run the import I get:
(proton-py3.11) ➜ proton git:(artur/prefect_3.4.19) python -X faulthandler -c "import time, importlib; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print(time.perf_counter()-t)"
35.913722010998754
(proton-py3.11) ➜ proton git:(artur/prefect_3.4.19) ✗Marvin
10/01/2025, 3:52 PMMarvin
10/01/2025, 3:53 PMcentrifuge.prefect.flows.nucleus_ingestion (Prefect has to import the module to find the flow). The 35s import confirms there’s side-effectful code at module import.
Let’s quickly identify the exact line and the slow dependency, then fix by deferring that work to runtime.
Find the exact blocking line
- Faulthandler stack dump during the hang:
1) Start the import and leave it running:
python -X faulthandler -c "import importlib, time; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print('done in', time.perf_counter()-t)"
2) In another shell, send SIGUSR1 to the PID:
kill -USR1 <pid>
This will print a Python traceback showing the exact line that’s stuck.
- If SIGUSR1 isn’t an option, use py-spy:
py-spy dump --pid <pid>
or run the import under:
py-spy record -o profile.svg -- python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')"
Then open profile.svg to see the hot stack.
Profile which import is slow
- Python’s import-time profiler will show which module causes the delay:
PYTHONPROFILEIMPORTTIME=1 python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')" 2> importtime.log
Then inspect the tail of importtime.log for big cumulative times. Common culprits: kubernetes, boto3/aiobotocore, google.cloud, snowflake, psycopg/sqlalchemy engine creation, great_expectations, pandas reading data, any network call, etc.
Quick code search hints
- Grep for typical import-time side effects:
grep -nE "load_kube_config|load_incluster_config|Client\\(|connect\\(|engine\\(|Session\\(|boto3|google.cloud|snowflake|psycopg|requests\\.|httpx\\.|from_env|os\\.environ\\['.*'\\]" -n centrifuge/prefect/flows/nucleus_ingestion.py
- Also check imported local modules from that file for similar patterns.
Make the module import-safe (pattern)
- Move side effects behind functions or into tasks so importing the module is fast and pure.
Bad (runs at import):
from kubernetes import config, client
config.load_kube_config()
k8s = client.CoreV1Api()
engine = create_engine(DB_URL) # connects eagerly
Good (lazy init at runtime):
from kubernetes import config, client
from functools import lru_cache
from sqlalchemy import create_engine
@lru_cache
def get_k8s():
# choose in-cluster or kubeconfig at run-time
try:
config.load_incluster_config()
except Exception:
config.load_kube_config()
return client.CoreV1Api()
@lru_cache
def get_engine():
return create_engine(DB_URL, pool_pre_ping=True)
@task
def use_k8s(...):
api = get_k8s()
...
@task
def use_db(...):
eng = get_engine()
...
Also safe flow file structure:
from prefect import flow, task
# Avoid any network calls or client creation here
@task
def step(...):
...
@flow
def nucleus_ingestion(...):
step(...)
if __name__ == "__main__":
nucleus_ingestion()
Next steps
1) Run the faulthandler or py-spy dump during the hang and paste the top of the stack here — that will tell us exactly which line is blocking.
2) If you’d prefer, paste the top of nucleus_ingestion.py (and any imported local modules) up to where the stack points. I’ll suggest a concrete refactor to defer that work.
3) After the change, re-run:
python -X importtime -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')"
You should see sub-second import time, and prefect --no-prompt deploy ... should proceed immediately.
Note: The --prefect-file option isn’t in the 3.x CLI help; if you still use it, it may be ignored. Running from the project root without it is safest while we debug.artur
10/01/2025, 4:13 PM(.venv) ➜ proton git:(artur/prefect_3.4.19) ✗ python test.py
A) opening client...
A) client OK
B) importing flow from entrypoint...
^C^CTraceback (most recent call last):
File "/Users/artur/code/proton/test.py", line 30, in <module>
anyio.run(main)
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_core/_eventloop.py", line 74, in run
return async_backend.run(func, args, {}, backend_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2325, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2313, in wrapper
return await func(*args)
^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/test.py", line 16, in main
flow = load_flow_from_entrypoint(ENTRYPOINT)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/flows.py", line 2503, in load_flow_from_entrypoint
flow: Flow[P, Any] = import_object(entrypoint) # pyright: ignore[reportRedeclaration]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 157, in import_object
module = load_script_as_module(script_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 120, in load_script_as_module
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/Users/artur/code/proton/centrifuge/prefect/flows/nucleus_ingestion.py", line 16, in <module>
from centrifuge.lake.jobs.clean_ccda_graph import clean_ccda_graph
File "/Users/artur/code/proton/centrifuge/lake/jobs/clean_ccda_graph.py", line 7, in <module>
from centrifuge.lake.clean_ccda_graph.clean_ccda_graph import (
File "/Users/artur/code/proton/centrifuge/lake/clean_ccda_graph/clean_ccda_graph.py", line 7, in <module>
from centrifuge.types.nucleus.utils import hash_ccda_graph
File "/Users/artur/code/proton/centrifuge/types/nucleus/__init__.py", line 1, in <module>
from centrifuge.types.nucleus.address import Address
File "/Users/artur/code/proton/centrifuge/types/nucleus/address.py", line 10, in <module>
from .common import Moment, rgetattr
File "/Users/artur/code/proton/centrifuge/types/nucleus/common.py", line 7, in <module>
from centrifuge.types.generated.ccda import ANY, classes, complex
File "/Users/artur/code/proton/centrifuge/types/generated/ccda/classes.py", line 122, in <module>
class OrganizationPartOfLevel2(ANY):
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic_xml/model.py", line 112, in __new__
cls = typing.cast(Type['BaseXmlModel'], super().__new__(mcls, name, bases, namespace, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 237, in __new__
complete_model_class(
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 607, in complete_model_class
schema = gen_schema.clean_schema(schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 669, in clean_schema
schema = self.defs.finalize_schema(schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 2800, in finalize_schema
gather_result = gather_schemas_for_cleaning(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 204, in gather_schemas_for_cleaning
traverse_schema(schema, context)
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 97, in traverse_schema
traverse_definition_ref(schema, context)
File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 80, in traverse_definition_ref
traverse_schema(definition, ctx)
... (repeats)Marvin
10/01/2025, 4:13 PMMarvin
10/01/2025, 4:15 PM# nucleus_ingestion.py
from prefect import flow, task
@task
def run_clean(...):
# Lazy import here avoids import work during `prefect deploy`
from centrifuge.lake.jobs.clean_ccda_graph import clean_ccda_graph
return clean_ccda_graph(...)
@flow
def nucleus_ingestion(...):
return run_clean(...)
- Do the same inside clean_ccda_graph and any utilities: only import centrifuge.types.generated.ccda.classes inside the exact functions that need them. This keeps the top-level import quick.
Short-term alternative: deploy without importing the module
- Use the Python API with a source-based deployment, which does not need to import your flow at deploy time:
from prefect.flows import Flow
flow = Flow.from_source(
source=".", # or a Git URL if that’s how workers get code
entrypoint="centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion",
)
flow.deploy(
name="nucleus-ingestion",
work_pool_name="<your-pool>",
build=False, # keep your existing image if you bake code in
push=False,
)
- This approach avoids the import entirely during deployment. See docs: https://docs-3.prefect.io/v3/how-to-guides/deployments/prefect-yaml and API ref index https://docs-3.prefect.io/v3/api-ref/python/.
Medium-term: tame the import cost in your model layer
- The stack shows pydantic v2 and pydantic-xml generating schemas for a large, cyclic graph. Options:
- Minimize what’s imported by default in centrifuge.types.nucleus.__init__ — only expose what’s truly needed; deep “star” re-exports can trigger import of the entire generated model set.
- If possible, split the generated models into smaller modules and import only the pieces needed by a given job.
- If this slowed down after the upgrade, compare dependency versions between your 3.4.7 and 3.4.20 environments. Try pinning pydantic/pydantic-xml back to the previous working versions to confirm a regression:
python -c "import pydantic, sys; print('pydantic', pydantic.__version__)"
python -c "import pydantic_xml, sys; print('pydantic-xml', getattr(pydantic_xml, '__version__', '?'))"
If a newer pydantic triggers pathological schema traversal on your model set, pinning to the earlier version is a quick mitigation while you refactor imports.
Verification helpers
- Identify the exact hotspot using Python’s import profiler:
PYTHONPROFILEIMPORTTIME=1 python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')" 2> importtime.log
tail -n 50 importtime.log
- Or use py-spy to snapshot while it’s “hung” and confirm it’s in pydantic schema code:
py-spy dump --pid <pid>
If you share the top few lines of the dump around centrifuge/types/generated/ccda/classes.py (e.g., which models are being constructed), I can suggest exactly where to place lazy imports. But the quickest unblock is:
- Create a thin flow wrapper module that defers all heavy imports to inside tasks/functions
- Use that wrapper as your deployment entrypoint
After that, prefect deploy should complete immediately.