<@ULVA73B9P> I’m attempting to upgrade my prefect ...
# ask-marvin
a
@Marvin I’m attempting to upgrade my prefect deployment from 3.4.7 to 3.4.20; I am running everything on kubernetes + postgres; I deployed the infrastructure correctly, the api endpoints seem to up. However
prefect deploy
is haginging on some of the flows without much indication why:
Copy code
╭──────────────────────────────────────────────────────────────────────────────╮
│ Deploying nucleus_ingestion                                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
{"name":"prefect.client","msg":"Connecting to API at <http://prefect-v3.data-dev.particlehealth.com/api/>","args":[],"levelname":"DEBUG","levelno":10,"pathname":"/home/prefect/.venv/lib/python3.11/site-packages/prefect/client/orchestration/__init__.py","filename":"__init__.py","module":"__init__","exc_info":null,"exc_text":null,"stack_info":null,"lineno":1098,"funcName":"__aenter__","created":1759261817.6602,"msecs":660.0,"relativeCreated":8926.210403442383,"thread":140149606460288,"threadName":"MainThread","processName":"MainProcess","process":29,"severity":"DEBUG"}
{"name":"prefect.client","msg":"Connecting to API at <http://prefect-v3.data-dev.particlehealth.com/api/>","args":[],"levelname":"DEBUG","levelno":10,"pathname":"/home/prefect/.venv/lib/python3.11/site-packages/prefect/client/orchestration/__init__.py","filename":"__init__.py","module":"__init__","exc_info":null,"exc_text":null,"stack_info":null,"lineno":1098,"funcName":"__aenter__","created":1759261817.6756892,"msecs":675.0,"relativeCreated":8941.699504852295,"thread":140149606460288,"threadName":"MainThread","processName":"MainProcess","process":29,"severity":"DEBUG"}
I know the api works because other flows deploy without issues; I haven’t made any changes to the flow itself in the upgrade PR I’m testing I already have debug turned on; how can I get more information here?
m
thought for 348.8 seconds
Thanks for the details — a few things can cause “silent” hangs during
prefect deploy
in 3.x, especially when only some flows are affected. The most common culprits are: - Waiting on a hidden CLI prompt (non‑interactive environment) - Import-time side effects in the flow’s module (network calls, DB/K8s client init, etc.) - Packaging/scan of a very large project directory (or symlink loops) without a
.prefectignore
Here’s how to get more signal and narrow it down. Quick checks - Force non-interactive mode in case it’s waiting for a prompt:
Copy code
PREFECT_CLI_PROMPT=false prefect --no-prompt deploy
- Try deploying a single entrypoint directly (bypasses some discovery):
Copy code
prefect --no-prompt deploy ./path/to/file.py:flow_func_name
Capture where it’s hanging - Enable Python faulthandler to dump a traceback when it appears stuck: 1) Start the deploy with:
Copy code
PYTHONFAULTHANDLER=1 prefect --no-prompt deploy
2) In another shell, send SIGUSR1 to the deploy process PID:
Copy code
kill -USR1 <pid>
This will print a full stack trace to stderr so we can see exactly what it’s waiting on. Turn up client/HTTP logging - Create a minimal logging config to include `httpx`/`anyio` at DEBUG:
Copy code
version: 1
  disable_existing_loggers: False
  loggers:
    prefect:
      level: DEBUG
      handlers: [console]
      propagate: False
    httpx:
      level: DEBUG
      handlers: [console]
      propagate: False
    anyio:
      level: DEBUG
      handlers: [console]
      propagate: False
  handlers:
    console:
      class: logging.StreamHandler
      formatter: basic
  formatters:
    basic:
      format: "%(asctime)s %(levelname)s %(name)s - %(message)s"
Then run:
Copy code
PREFECT_LOGGING_SETTINGS_PATH=./logging.yml prefect --no-prompt deploy
Rule out import-time side effects -
prefect deploy
imports your flow module to discover the flow object. If your module does work at import time (connects to Postgres, K8s API, reads secrets, long computations), the CLI can appear to “hang”. - Test the entrypoint import in the same environment:
Copy code
python -c "import importlib; import time; t=time.time(); m=importlib.import_module('your.module.path'); print('Imported in', time.time()-t, 's'); print(hasattr(m, 'flow_func_name'))"
If this hangs, move side-effectful code behind
if __name__ == '__main__':
or into functions that are only called at run time, not at import. Speed up/avoid packaging stalls - If the project dir is large, add a
.prefectignore
to exclude heavy dirs and avoid scan/tar stalls:
Copy code
.venv/
  .git/
  __pycache__/
  node_modules/
  data/
  *.parquet
  *.csv
- Also ensure there are no symlink loops in your project directory. Optional: rule out HTTP/transport quirks - In a few environments, HTTP/2 can cause odd client behavior. You can try:
Copy code
PREFECT_API_ENABLE_HTTP2=false prefect --no-prompt deploy
A couple of helpful refs: - Prefect 3: prefect.yaml and deployments - Prefect 3 logging APIs - 3.4.x release notes If you can share: - The exact command you’re running - Whether this is in CI or an interactive shell - The entrypoint of a flow that hangs - The output from the faulthandler dump (or confirm whether importing the module hangs) …I can help pinpoint which step is blocking and suggest a fix specific to your project.
a
@Marvin I tried all of your suggestion but I still don’t get any more logs; im running this comand: PREFECT_LOGGING_SETTINGS_PATH=./logging.yml PYTHONFAULTHANDLER=1 prefect --no-prompt deploy --prefect-file prefect/v3/pyproject.toml centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion killing this process doesn’t give me any extra logs
m
thought for 345.2 seconds
Thanks for the detail. A few targeted things to help us pinpoint exactly where it’s blocking and get actionable signal. First, quick clarifications based on what you tried - Faulthandler only dumps a traceback if you send SIGUSR1 to the running process. A normal “kill” (SIGTERM) won’t emit a dump. If you’re on Linux/macOS, do:
Copy code
PYTHONFAULTHANDLER=1 prefect --no-prompt deploy --prefect-file prefect/v3/pyproject.toml centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion &
  echo $!   # get PID
  kill -USR1 <pid>
If you’re on Windows or cannot send USR1, use py-spy instead:
Copy code
py-spy dump --pid <pid>
This gives us the exact line where it’s stuck. - One more note: I don’t see a documented
--prefect-file
option in the CLI help for Prefect 3.x. It might be benign or ignored, but to rule out config-path issues, please try once from your project root without that option:
Copy code
prefect --no-prompt deploy centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion
Now, the fastest way to isolate the step that’s blocking Please run this minimal script in the same environment (same container/venv) as your CLI. It prints markers around the three phases that
prefect deploy
goes through: API client, module import, and deployment call.
Copy code
import anyio
from prefect.client.orchestration import get_client
from prefect.flows import load_flow_from_entrypoint

ENTRYPOINT = "centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion"
WORK_POOL = "<your-work-pool-name>"  # update

async def main():
    print("A) opening client...")
    async with get_client() as c:
        # simple request to confirm networking is fine
        await c.read_work_pools()
    print("A) client OK")

    print("B) importing flow from entrypoint...")
    flow = load_flow_from_entrypoint(ENTRYPOINT)
    print("B) flow import OK:", flow.name)

    print("C) deploying (no build/push)...")
    # Make sure to use an existing image if your work pool expects one
    flow_id = flow.deploy(
        name="debug-nucleus-ingestion",
        work_pool_name=WORK_POOL,
        build=False,
        push=False,
        print_next_steps=False,
    )
    print("C) deploy call returned:", flow_id)

anyio.run(main)
- If it hangs at A: it’s network/client-level. Try: -
PREFECT_API_ENABLE_HTTP2=false
(disables HTTP/2) - Confirm DNS/TLS with a quick
curl
to http://prefect-v3.data-dev.particlehealth.com/api/ from the same environment - Check server logs during this attempt to see if a request hits the API - If it hangs at B: it’s import-time side effects in your module. Common causes: - DB connections, K8s client initialization, or long computations executed at import time - Reading secrets or network calls during import Move those into functions or
if __name__ == "__main__":
guards so importing the module is fast and side-effect free. You can also time a plain import:
Copy code
python -X faulthandler -c "import time, importlib; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print(time.perf_counter()-t)"
- If it hangs at C but only when build/push are True in your normal deploy: - It’s likely packaging or Docker build/push getting stuck. To confirm via CLI:
Copy code
prefect --no-prompt deploy centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion --no-build --no-push
If that succeeds, look at your image build step or
.prefectignore
to reduce package size. - Add a
.prefectignore
to the project root to avoid scanning huge directories:
Copy code
.venv/
    .git/
    __pycache__/
    node_modules/
    data/
    *.parquet
    *.csv
- If you’re building an image in an environment without a Docker daemon or with a remote builder that’s unreachable, Docker will appear to “hang”. Extra logging knobs - Confirm Prefect logs are at DEBUG:
Copy code
PREFECT_LOGGING_ROOT_LEVEL=DEBUG prefect --no-prompt deploy ...
- Include httpx/anyio logs via logging.yml is good; if it printed nothing, the block is likely before/around module import or Docker build. Two more things to rule out - Project/config path: please try without
--prefect-file
once, or from the directory where your prefect.yaml resides. Then run
prefect config view
to confirm the API URL and profile are what you expect. - Server compatibility: confirm your server is 3.4.20 (or compatible) and DB migrations are up. If a specific API call deadlocks on the server, you’ll see it in server logs at the time of the hang. If you can share: - Which letter (A/B/C) is the last printed in the debug script - Whether
--no-build --no-push
makes the CLI version pass - Whether a raw import of
centrifuge.prefect.flows.nucleus_ingestion
completes quickly - Server logs around the time of the hang …I can make a much more precise recommendation.
a
@Marvin it hangs on step B:
Copy code
(.venv) ➜  proton git:(artur/prefect_3.4.19) ✗ python test.py
A) opening client...
A) client OK
B) importing flow from entrypoint...
when I run the import I get:
Copy code
(proton-py3.11) ➜  proton git:(artur/prefect_3.4.19) python -X faulthandler -c "import time, importlib; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print(time.perf_counter()-t)"

35.913722010998754
(proton-py3.11) ➜  proton git:(artur/prefect_3.4.19) ✗
m
thought for 62.1 seconds
Great, thanks — that pinpoints it. Since it hangs at step B, the blocker is import-time work in
centrifuge.prefect.flows.nucleus_ingestion
(Prefect has to import the module to find the flow). The 35s import confirms there’s side-effectful code at module import. Let’s quickly identify the exact line and the slow dependency, then fix by deferring that work to runtime. Find the exact blocking line - Faulthandler stack dump during the hang: 1) Start the import and leave it running:
Copy code
python -X faulthandler -c "import importlib, time; t=time.perf_counter(); importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion'); print('done in', time.perf_counter()-t)"
2) In another shell, send SIGUSR1 to the PID:
Copy code
kill -USR1 <pid>
This will print a Python traceback showing the exact line that’s stuck. - If SIGUSR1 isn’t an option, use py-spy:
Copy code
py-spy dump --pid <pid>
or run the import under:
Copy code
py-spy record -o profile.svg -- python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')"
Then open profile.svg to see the hot stack. Profile which import is slow - Python’s import-time profiler will show which module causes the delay:
Copy code
PYTHONPROFILEIMPORTTIME=1 python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')" 2> importtime.log
Then inspect the tail of importtime.log for big cumulative times. Common culprits: kubernetes, boto3/aiobotocore, google.cloud, snowflake, psycopg/sqlalchemy engine creation, great_expectations, pandas reading data, any network call, etc. Quick code search hints - Grep for typical import-time side effects:
Copy code
grep -nE "load_kube_config|load_incluster_config|Client\\(|connect\\(|engine\\(|Session\\(|boto3|google.cloud|snowflake|psycopg|requests\\.|httpx\\.|from_env|os\\.environ\\['.*'\\]" -n centrifuge/prefect/flows/nucleus_ingestion.py
- Also check imported local modules from that file for similar patterns. Make the module import-safe (pattern) - Move side effects behind functions or into tasks so importing the module is fast and pure. Bad (runs at import):
Copy code
from kubernetes import config, client
config.load_kube_config()
k8s = client.CoreV1Api()

engine = create_engine(DB_URL)  # connects eagerly
Good (lazy init at runtime):
Copy code
from kubernetes import config, client
from functools import lru_cache
from sqlalchemy import create_engine

@lru_cache
def get_k8s():
    # choose in-cluster or kubeconfig at run-time
    try:
        config.load_incluster_config()
    except Exception:
        config.load_kube_config()
    return client.CoreV1Api()

@lru_cache
def get_engine():
    return create_engine(DB_URL, pool_pre_ping=True)

@task
def use_k8s(...):
    api = get_k8s()
    ...

@task
def use_db(...):
    eng = get_engine()
    ...
Also safe flow file structure:
Copy code
from prefect import flow, task

# Avoid any network calls or client creation here

@task
def step(...):
    ...

@flow
def nucleus_ingestion(...):
    step(...)

if __name__ == "__main__":
    nucleus_ingestion()
Next steps 1) Run the faulthandler or py-spy dump during the hang and paste the top of the stack here — that will tell us exactly which line is blocking. 2) If you’d prefer, paste the top of
nucleus_ingestion.py
(and any imported local modules) up to where the stack points. I’ll suggest a concrete refactor to defer that work. 3) After the change, re-run:
Copy code
python -X importtime -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')"
You should see sub-second import time, and
prefect --no-prompt deploy ...
should proceed immediately. Note: The
--prefect-file
option isn’t in the 3.x CLI help; if you still use it, it may be ignored. Running from the project root without it is safest while we debug.
a
@Marvin when I ctrl+c the test script I do have some logs that indicate an infinite loop, does that help narrow it down:
Copy code
(.venv) ➜  proton git:(artur/prefect_3.4.19) ✗ python test.py
A) opening client...
A) client OK
B) importing flow from entrypoint...
^C^CTraceback (most recent call last):
  File "/Users/artur/code/proton/test.py", line 30, in <module>
    anyio.run(main)
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_core/_eventloop.py", line 74, in run
    return async_backend.run(func, args, {}, backend_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2325, in run
    return runner.run(wrapper())
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/Users/artur/.pyenv/versions/3.11.10/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2313, in wrapper
    return await func(*args)
           ^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/test.py", line 16, in main
    flow = load_flow_from_entrypoint(ENTRYPOINT)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/flows.py", line 2503, in load_flow_from_entrypoint
    flow: Flow[P, Any] = import_object(entrypoint)  # pyright: ignore[reportRedeclaration]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 157, in import_object
    module = load_script_as_module(script_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 120, in load_script_as_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/artur/code/proton/centrifuge/prefect/flows/nucleus_ingestion.py", line 16, in <module>
    from centrifuge.lake.jobs.clean_ccda_graph import clean_ccda_graph
  File "/Users/artur/code/proton/centrifuge/lake/jobs/clean_ccda_graph.py", line 7, in <module>
    from centrifuge.lake.clean_ccda_graph.clean_ccda_graph import (
  File "/Users/artur/code/proton/centrifuge/lake/clean_ccda_graph/clean_ccda_graph.py", line 7, in <module>
    from centrifuge.types.nucleus.utils import hash_ccda_graph
  File "/Users/artur/code/proton/centrifuge/types/nucleus/__init__.py", line 1, in <module>
    from centrifuge.types.nucleus.address import Address
  File "/Users/artur/code/proton/centrifuge/types/nucleus/address.py", line 10, in <module>
    from .common import Moment, rgetattr
  File "/Users/artur/code/proton/centrifuge/types/nucleus/common.py", line 7, in <module>
    from centrifuge.types.generated.ccda import ANY, classes, complex
  File "/Users/artur/code/proton/centrifuge/types/generated/ccda/classes.py", line 122, in <module>
    class OrganizationPartOfLevel2(ANY):
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic_xml/model.py", line 112, in __new__
    cls = typing.cast(Type['BaseXmlModel'], super().__new__(mcls, name, bases, namespace, **kwargs))
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 237, in __new__
    complete_model_class(
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 607, in complete_model_class
    schema = gen_schema.clean_schema(schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 669, in clean_schema
    schema = self.defs.finalize_schema(schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 2800, in finalize_schema
    gather_result = gather_schemas_for_cleaning(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 204, in gather_schemas_for_cleaning
    traverse_schema(schema, context)
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 97, in traverse_schema
    traverse_definition_ref(schema, context)
  File "/Users/artur/code/proton/.venv/lib/python3.11/site-packages/pydantic/_internal/_schema_gather.py", line 80, in traverse_definition_ref
    traverse_schema(definition, ctx)
... (repeats)
m
thought for 110.6 seconds
That stack trace is exactly what we need — the hang is during import and it’s inside your generated CCDA Pydantic/Pydantic-XML models. Pydantic is recursively building JSON schema at class creation time and it’s either extremely expensive or effectively infinite due to a cycle in those models. Practical ways to unblock and fix: Short-term: avoid importing heavy models at module import - Your flow file imports: - nucleus_ingestion -> clean_ccda_graph -> nucleus types -> generated ccda classes -> pydantic_xml BaseXmlModel -> deep schema traversal - Move those imports inside functions/tasks that run at flow time, not at module import time. Example:
Copy code
# nucleus_ingestion.py
from prefect import flow, task

@task
def run_clean(...):
    # Lazy import here avoids import work during `prefect deploy`
    from centrifuge.lake.jobs.clean_ccda_graph import clean_ccda_graph
    return clean_ccda_graph(...)

@flow
def nucleus_ingestion(...):
    return run_clean(...)
- Do the same inside
clean_ccda_graph
and any utilities: only import
centrifuge.types.generated.ccda.classes
inside the exact functions that need them. This keeps the top-level import quick. Short-term alternative: deploy without importing the module - Use the Python API with a source-based deployment, which does not need to import your flow at deploy time:
Copy code
from prefect.flows import Flow

flow = Flow.from_source(
    source=".",  # or a Git URL if that’s how workers get code
    entrypoint="centrifuge/prefect/flows/nucleus_ingestion.py:nucleus_ingestion",
)

flow.deploy(
    name="nucleus-ingestion",
    work_pool_name="<your-pool>",
    build=False,   # keep your existing image if you bake code in
    push=False,
)
- This approach avoids the import entirely during deployment. See docs: https://docs-3.prefect.io/v3/how-to-guides/deployments/prefect-yaml and API ref index https://docs-3.prefect.io/v3/api-ref/python/. Medium-term: tame the import cost in your model layer - The stack shows pydantic v2 and pydantic-xml generating schemas for a large, cyclic graph. Options: - Minimize what’s imported by default in
centrifuge.types.nucleus.__init__
— only expose what’s truly needed; deep “star” re-exports can trigger import of the entire generated model set. - If possible, split the generated models into smaller modules and import only the pieces needed by a given job. - If this slowed down after the upgrade, compare dependency versions between your 3.4.7 and 3.4.20 environments. Try pinning pydantic/pydantic-xml back to the previous working versions to confirm a regression:
Copy code
python -c "import pydantic, sys; print('pydantic', pydantic.__version__)"
    python -c "import pydantic_xml, sys; print('pydantic-xml', getattr(pydantic_xml, '__version__', '?'))"
If a newer pydantic triggers pathological schema traversal on your model set, pinning to the earlier version is a quick mitigation while you refactor imports. Verification helpers - Identify the exact hotspot using Python’s import profiler:
Copy code
PYTHONPROFILEIMPORTTIME=1 python -c "import importlib; importlib.import_module('centrifuge.prefect.flows.nucleus_ingestion')" 2> importtime.log
  tail -n 50 importtime.log
- Or use py-spy to snapshot while it’s “hung” and confirm it’s in pydantic schema code:
Copy code
py-spy dump --pid <pid>
If you share the top few lines of the dump around
centrifuge/types/generated/ccda/classes.py
(e.g., which models are being constructed), I can suggest exactly where to place lazy imports. But the quickest unblock is: - Create a thin flow wrapper module that defers all heavy imports to inside tasks/functions - Use that wrapper as your deployment entrypoint After that,
prefect deploy
should complete immediately.