Mehdi Lallouache
09/17/2025, 6:47 PMRuntimeError: Nanny failed to start.
whern running task in parralel wirh DaskTaskRunner, happnened suddenly, it used to work, any idea why it is the case?Marvin
09/17/2025, 6:47 PMMarvin
09/17/2025, 6:49 PMif __name__ == "__main__":
guard
- Docker/Kubernetes changes (e.g., small /dev/shm, changed base image, new security restrictions)
- Import/environment mismatch
- Workers can’t import your code or dependencies (e.g., changed PYTHONPATH or missing dep)
Quick checks and fixes
1) Verify versions (pin if needed)
- Make sure dask and distributed match and prefect-dask is installed:
python -c "import prefect, prefect_dask, dask, distributed; print('prefect', prefect.__version__, 'prefect-dask', prefect_dask.__version__, 'dask', dask.__version__, 'distributed', distributed.__version__)"
- If something updated recently, try pinning to the last known-good versions and re-run.
2) Try a minimal flow to isolate environment vs. your code
from prefect import flow, task
from prefect_dask.task_runners import DaskTaskRunner
@task
def inc(x):
return x + 1
@flow(task_runner=DaskTaskRunner(cluster_kwargs={"n_workers": 2, "threads_per_worker": 1}))
def test():
futs = [inc.submit(i) for i in range(10)]
return sum(f.result() for f in futs)
if __name__ == "__main__":
print(test())
- If this fails too, it’s an environment/runtime issue.
3) Switch to threads temporarily (avoids process nanny) to confirm spawn problems
DaskTaskRunner(cluster_kwargs={"processes": False, "n_workers": 1, "threads_per_worker": 8})
- If this works, the failure is specifically in process spawning.
4) Set a stable local directory and ensure disk space
DaskTaskRunner(cluster_kwargs={"local_directory": "/tmp/dask", "n_workers": 2, "threads_per_worker": 1})
- Make sure /tmp/dask
exists and is writable with free space.
5) Check OS/container limits
- On Linux: ulimit -n
(try increasing open files)
- In Docker: ensure adequate memory and shared memory; try --shm-size=1g
if using containers
- Look for OOM kills in dmesg
or container logs
6) Enable logs and review worker errors
- Prefect logs: set PREFECT_LOGGING_LEVEL=DEBUG
- Dask dashboard: add dashboard_address=":8787"
in cluster_kwargs
and visit the UI to see worker logs
- Save a performance report to see worker lifecycle:
DaskTaskRunner(
cluster_kwargs={"n_workers": 2, "threads_per_worker": 1},
performance_report_path="dask-report.html"
)
7) If you were relying on a system-wide or base image change recently
- Double-check that your Prefect worker runtime still includes dask/distributed, and your project dependencies
- If you run in K8s/Docker, confirm the image tag did not silently bump
8) If on Windows/macOS
- Ensure your script entry point has:
if __name__ == "__main__":
flow_fn()
If you’d like, share:
- Full error/traceback around “Nanny failed to start”
- Your Prefect, prefect-dask, dask, distributed, Python versions
- How you create the task runner (your DaskTaskRunner(...)
line)
- Where it runs (local machine, Docker, Kubernetes) and any recent changes
- OS and resource limits (RAM, ulimit output), and whether the minimal example above works
Docs for reference
- Prefect Dask integration: Prefect + Dask (Prefect 3.x)
- Dask workers overview: Dask Distributed Workers
Happy to help debug further once we have a bit more detail.Mehdi Lallouache
09/17/2025, 6:56 PMMarvin
09/17/2025, 6:56 PMMarvin
09/17/2025, 6:57 PMif __name__ == "__main__":
guard, those child processes can re-import and re-execute your module, causing the nanny to fail on startup.
Pattern to keep handy:
from prefect import flow
from prefect_dask.task_runners import DaskTaskRunner
@flow(task_runner=DaskTaskRunner())
def my_flow():
...
if __name__ == "__main__":
my_flow()
More details: Prefect + Dask (Prefect 3.x)
If you run into anything else, ping me!Mehdi Lallouache
09/23/2025, 8:57 AMMarvin
09/23/2025, 8:57 AMresearch_prefect_topic
📊 Tools used:
• research_prefect_topic
(1x)