Peter Korolev
09/23/2025, 10:28 AMOSError: [Errno 28] No space left on device
from following code. len(args) is 10000, and there are 16 processes. I have enough disk space, what can cause this error?
with ProcessPoolTaskRunner(max_workers=args.processes) as runner:
futures = [runner.submit(process_arg, args) for args in arguments]
for future in futures:
future.wait()
Marvin
09/23/2025, 10:29 AMMarvin
09/23/2025, 10:33 AMdf -h /dev/shm
ls -l /dev/shm | wc -l
If you’re in Docker/K8s, /dev/shm is typically tiny (64MB in Docker by default) and Python multiprocessing uses POSIX semaphores in /dev/shm; creating many can trigger “No space left on device”.
- Check inodes
df -i
If IUse% is 100% for the filesystem hosting your temp/results, you’re out of inodes.
- Check temp and container layer
df -h /
df -h /tmp
In containers, the writable layer may be full even if the host isn’t.
Mitigations:
1) If running in Docker/Kubernetes
- Increase shared memory:
- Docker: add --shm-size=1g
- Kubernetes: mount an EmptyDir at /dev/shm
with medium: Memory
and a sizeLimit
- Ensure your container’s writable layer or ephemeral volumes have enough space.
2) Don’t queue 10k futures at once
- Submit in batches so only a small number are “in flight”; this reduces OS resources (/dev/shm objects, temp files).
batch_size = 1000 # or 4 * args.processes
with ProcessPoolTaskRunner(max_workers=args.processes) as runner:
for i in range(0, len(arguments), batch_size):
futures = [runner.submit(process_arg, a) for a in arguments[i:i+batch_size]]
for f in futures:
f.wait()
3) Results/temp storage
- If you’ve enabled local result persistence, 10k tasks can create lots of small files; point results at a larger filesystem or disable persistence for this run.
- Prefect 3.x results docs: Results in Prefect 3
4) If you see many leftover sem.*
files in /dev/shm
, you’re likely hitting the shared memory limit or leaking semaphores after crashes. Cleaning them up and increasing /dev/shm
usually resolves it.
If you can share:
- Whether this is running in Docker/Kubernetes (and any resource limits)
- The full traceback (to see if it fails creating a semaphore or writing a file)
- Whether you’ve enabled result persistence
…I can give more targeted guidance.