<@ULVA73B9P> got a ```RuntimeError: Nanny failed t...
# ask-marvin
m
@Marvin got a
Copy code
RuntimeError: Nanny failed to start.
whern running task in parralel wirh DaskTaskRunner, happnened suddenly, it used to work, any idea why it is the case?
m
thought for 144.5 seconds
Sorry you’re hitting that — “Nanny failed to start” is a Dask worker start-up failure, and it usually points to an environment or resource change rather than your Prefect code. Since it used to work and suddenly stopped, the most common culprits are: Likely causes - Package/version drift - Dask and distributed got out of sync, or a recent update introduced incompatibility - Prefect/prefect-dask upgraded and your environment image changed - Resource/OS limits - Not enough RAM / OOM killer taking out workers - Too many open files (ulimit), ephemeral port exhaustion, firewall changes - Temporary directory full or not writable - Process spawn issues - Windows/macOS missing the
if __name__ == "__main__":
guard - Docker/Kubernetes changes (e.g., small /dev/shm, changed base image, new security restrictions) - Import/environment mismatch - Workers can’t import your code or dependencies (e.g., changed PYTHONPATH or missing dep) Quick checks and fixes 1) Verify versions (pin if needed) - Make sure dask and distributed match and prefect-dask is installed:
Copy code
python -c "import prefect, prefect_dask, dask, distributed; print('prefect', prefect.__version__, 'prefect-dask', prefect_dask.__version__, 'dask', dask.__version__, 'distributed', distributed.__version__)"
- If something updated recently, try pinning to the last known-good versions and re-run. 2) Try a minimal flow to isolate environment vs. your code
Copy code
from prefect import flow, task
from prefect_dask.task_runners import DaskTaskRunner

@task
def inc(x): 
    return x + 1

@flow(task_runner=DaskTaskRunner(cluster_kwargs={"n_workers": 2, "threads_per_worker": 1}))
def test():
    futs = [inc.submit(i) for i in range(10)]
    return sum(f.result() for f in futs)

if __name__ == "__main__":
    print(test())
- If this fails too, it’s an environment/runtime issue. 3) Switch to threads temporarily (avoids process nanny) to confirm spawn problems
Copy code
DaskTaskRunner(cluster_kwargs={"processes": False, "n_workers": 1, "threads_per_worker": 8})
- If this works, the failure is specifically in process spawning. 4) Set a stable local directory and ensure disk space
Copy code
DaskTaskRunner(cluster_kwargs={"local_directory": "/tmp/dask", "n_workers": 2, "threads_per_worker": 1})
- Make sure
/tmp/dask
exists and is writable with free space. 5) Check OS/container limits - On Linux:
ulimit -n
(try increasing open files) - In Docker: ensure adequate memory and shared memory; try
--shm-size=1g
if using containers - Look for OOM kills in
dmesg
or container logs 6) Enable logs and review worker errors - Prefect logs: set
PREFECT_LOGGING_LEVEL=DEBUG
- Dask dashboard: add
dashboard_address=":8787"
in
cluster_kwargs
and visit the UI to see worker logs - Save a performance report to see worker lifecycle:
Copy code
DaskTaskRunner(
  cluster_kwargs={"n_workers": 2, "threads_per_worker": 1},
  performance_report_path="dask-report.html"
)
7) If you were relying on a system-wide or base image change recently - Double-check that your Prefect worker runtime still includes dask/distributed, and your project dependencies - If you run in K8s/Docker, confirm the image tag did not silently bump 8) If on Windows/macOS - Ensure your script entry point has:
Copy code
if __name__ == "__main__":
    flow_fn()
If you’d like, share: - Full error/traceback around “Nanny failed to start” - Your Prefect, prefect-dask, dask, distributed, Python versions - How you create the task runner (your
DaskTaskRunner(...)
line) - Where it runs (local machine, Docker, Kubernetes) and any recent changes - OS and resource limits (RAM, ulimit output), and whether the minimal example above works Docs for reference - Prefect Dask integration: Prefect + Dask (Prefect 3.x) - Dask workers overview: Dask Distributed Workers Happy to help debug further once we have a bit more detail.
m
@Marvin fixed with if name == "__main__": thanks
m
thought for 11.9 seconds
Awesome — glad that solved it! Quick why: Dask’s LocalCluster starts worker processes (on Windows/macOS and often on Linux with spawn), and without the
if __name__ == "__main__":
guard, those child processes can re-import and re-execute your module, causing the nanny to fail on startup. Pattern to keep handy:
Copy code
from prefect import flow
from prefect_dask.task_runners import DaskTaskRunner

@flow(task_runner=DaskTaskRunner())
def my_flow():
    ...

if __name__ == "__main__":
    my_flow()
More details: Prefect + Dask (Prefect 3.x) If you run into anything else, ping me!
m
@Marvin I am using to_deployment to deploy a flow, I want to schedule it twice a day a 9a.m. and 3p.m. with different parameters how can I do that?
m
🔄 Researching your answer... this may take a while 🔧 Using tool:
research_prefect_topic
📊 Tools used: •
research_prefect_topic
(1x)