Peter Korolev
09/23/2025, 10:28 AMOSError: [Errno 28] No space left on device from following code. len(args) is 10000, and there are 16 processes. I have enough disk space, what can cause this error?
with ProcessPoolTaskRunner(max_workers=args.processes) as runner:
    futures = [runner.submit(process_arg, args) for args in arguments]
    for future in futures:
        future.wait()Marvin
09/23/2025, 10:29 AMMarvin
09/23/2025, 10:33 AMdf -h /dev/shm
  ls -l /dev/shm | wc -l
  If you’re in Docker/K8s, /dev/shm is typically tiny (64MB in Docker by default) and Python multiprocessing uses POSIX semaphores in /dev/shm; creating many can trigger “No space left on device”.
- Check inodes
  df -i
  If IUse% is 100% for the filesystem hosting your temp/results, you’re out of inodes.
- Check temp and container layer
  df -h /
  df -h /tmp
  In containers, the writable layer may be full even if the host isn’t.
Mitigations:
1) If running in Docker/Kubernetes
   - Increase shared memory:
     - Docker: add --shm-size=1g
     - Kubernetes: mount an EmptyDir at /dev/shm with medium: Memory and a sizeLimit
   - Ensure your container’s writable layer or ephemeral volumes have enough space.
2) Don’t queue 10k futures at once
   - Submit in batches so only a small number are “in flight”; this reduces OS resources (/dev/shm objects, temp files).
   batch_size = 1000  # or 4 * args.processes
   with ProcessPoolTaskRunner(max_workers=args.processes) as runner:
       for i in range(0, len(arguments), batch_size):
           futures = [runner.submit(process_arg, a) for a in arguments[i:i+batch_size]]
           for f in futures:
               f.wait()
3) Results/temp storage
   - If you’ve enabled local result persistence, 10k tasks can create lots of small files; point results at a larger filesystem or disable persistence for this run.
   - Prefect 3.x results docs: Results in Prefect 3
4) If you see many leftover sem.* files in /dev/shm, you’re likely hitting the shared memory limit or leaking semaphores after crashes. Cleaning them up and increasing /dev/shm usually resolves it.
If you can share:
- Whether this is running in Docker/Kubernetes (and any resource limits)
- The full traceback (to see if it fails creating a semaphore or writing a file)
- Whether you’ve enabled result persistence
…I can give more targeted guidance.