< Marvin> got a ```RuntimeError Nanny failed to start ``` wh Prefect Community #ask-marvin

<@ULVA73B9P> got a ```RuntimeError: Nanny failed t...

Mehdi Lallouache

09/17/2025, 6:47 PM

@Marvin got a

Copy code

RuntimeError: Nanny failed to start.

whern running task in parralel wirh DaskTaskRunner, happnened suddenly, it used to work, any idea why it is the case?

Marvin

09/17/2025, 6:47 PM

✅ thought for 144.5 seconds

Marvin

09/17/2025, 6:49 PM

Sorry you’re hitting that — “Nanny failed to start” is a Dask worker start-up failure, and it usually points to an environment or resource change rather than your Prefect code. Since it used to work and suddenly stopped, the most common culprits are: Likely causes - Package/version drift - Dask and distributed got out of sync, or a recent update introduced incompatibility - Prefect/prefect-dask upgraded and your environment image changed - Resource/OS limits - Not enough RAM / OOM killer taking out workers - Too many open files (ulimit), ephemeral port exhaustion, firewall changes - Temporary directory full or not writable - Process spawn issues - Windows/macOS missing the

if __name__ == "__main__":

guard - Docker/Kubernetes changes (e.g., small /dev/shm, changed base image, new security restrictions) - Import/environment mismatch - Workers can’t import your code or dependencies (e.g., changed PYTHONPATH or missing dep) Quick checks and fixes 1) Verify versions (pin if needed) - Make sure dask and distributed match and prefect-dask is installed:

Copy code

python -c "import prefect, prefect_dask, dask, distributed; print('prefect', prefect.__version__, 'prefect-dask', prefect_dask.__version__, 'dask', dask.__version__, 'distributed', distributed.__version__)"

- If something updated recently, try pinning to the last known-good versions and re-run. 2) Try a minimal flow to isolate environment vs. your code

Copy code

from prefect import flow, task
from prefect_dask.task_runners import DaskTaskRunner

@task
def inc(x): 
    return x + 1

@flow(task_runner=DaskTaskRunner(cluster_kwargs={"n_workers": 2, "threads_per_worker": 1}))
def test():
    futs = [inc.submit(i) for i in range(10)]
    return sum(f.result() for f in futs)

if __name__ == "__main__":
    print(test())

- If this fails too, it’s an environment/runtime issue. 3) Switch to threads temporarily (avoids process nanny) to confirm spawn problems

Copy code

DaskTaskRunner(cluster_kwargs={"processes": False, "n_workers": 1, "threads_per_worker": 8})

- If this works, the failure is specifically in process spawning. 4) Set a stable local directory and ensure disk space

Copy code

DaskTaskRunner(cluster_kwargs={"local_directory": "/tmp/dask", "n_workers": 2, "threads_per_worker": 1})

- Make sure

/tmp/dask

exists and is writable with free space. 5) Check OS/container limits - On Linux:

ulimit -n

(try increasing open files) - In Docker: ensure adequate memory and shared memory; try

--shm-size=1g

if using containers - Look for OOM kills in

dmesg

or container logs 6) Enable logs and review worker errors - Prefect logs: set

PREFECT_LOGGING_LEVEL=DEBUG

- Dask dashboard: add

dashboard_address=":8787"

cluster_kwargs

and visit the UI to see worker logs - Save a performance report to see worker lifecycle:

Copy code

DaskTaskRunner(
  cluster_kwargs={"n_workers": 2, "threads_per_worker": 1},
  performance_report_path="dask-report.html"
)

7) If you were relying on a system-wide or base image change recently - Double-check that your Prefect worker runtime still includes dask/distributed, and your project dependencies - If you run in K8s/Docker, confirm the image tag did not silently bump 8) If on Windows/macOS - Ensure your script entry point has:

Copy code

if __name__ == "__main__":
    flow_fn()

If you’d like, share: - Full error/traceback around “Nanny failed to start” - Your Prefect, prefect-dask, dask, distributed, Python versions - How you create the task runner (your

DaskTaskRunner(...)

line) - Where it runs (local machine, Docker, Kubernetes) and any recent changes - OS and resource limits (RAM, ulimit output), and whether the minimal example above works Docs for reference - Prefect Dask integration: Prefect + Dask (Prefect 3.x) - Dask workers overview: Dask Distributed Workers Happy to help debug further once we have a bit more detail.

Mehdi Lallouache

09/17/2025, 6:56 PM

@Marvin fixed with if name == "__main__": thanks

Marvin

09/17/2025, 6:56 PM

✅ thought for 11.9 seconds

Marvin

09/17/2025, 6:57 PM

Awesome — glad that solved it! Quick why: Dask’s LocalCluster starts worker processes (on Windows/macOS and often on Linux with spawn), and without the

if __name__ == "__main__":

guard, those child processes can re-import and re-execute your module, causing the nanny to fail on startup. Pattern to keep handy:

Copy code

from prefect import flow
from prefect_dask.task_runners import DaskTaskRunner

@flow(task_runner=DaskTaskRunner())
def my_flow():
    ...

if __name__ == "__main__":
    my_flow()

More details: Prefect + Dask (Prefect 3.x) If you run into anything else, ping me!

Mehdi Lallouache

09/23/2025, 8:57 AM

@Marvin I am using to_deployment to deploy a flow, I want to schedule it twice a day a 9a.m. and 3p.m. with different parameters how can I do that?

Marvin

09/23/2025, 8:57 AM

🔄 Researching your answer... this may take a while 🔧 Using tool:

research_prefect_topic

📊 Tools used: •

research_prefect_topic

(1x)

12 Views

Open in Slack

Previous Next