<@ULVA73B9P> Why am I getting the error "OSError: ...
# ask-marvin
c
@Marvin Why am I getting the error "OSError: [Errno 28] No space left on device" on my worker. How do I clean this up? I'm getting crashes for one of my jobs.
m
thought for 333.5 seconds
“OSError: [Errno 28] No space left on device” means the machine (or container/node) your worker runs on is out of disk or inode space. With Prefect workers this typically comes from one or more of: - OS temp directories (e.g., /tmp) filling up from ephemeral run artifacts or user code - Local result or artifact files written by your flow/tasks - Container images/volumes/logs (Docker/Kubernetes workers) - The Prefect home dir (~/.prefect) if it’s redirected to a small disk Quick triage - Check overall space and biggest offenders
Copy code
df -h
  df -ih
  sudo du -xhd1 /tmp 2>/dev/null | sort -h
  sudo du -xhd1 ~ 2>/dev/null | sort -h
- If using Docker worker: also
Copy code
docker system df
  docker ps -a | wc -l
  docker volume ls | wc -l
- If using Kubernetes worker: check node and pod ephemeral storage
Copy code
kubectl top nodes
  kubectl top pods -A
  kubectl describe node <node-name> | grep -i "ephemeral"
Where Prefect 3.x puts things by default - Prefect home:
~/.prefect
(configurable via
PREFECT_HOME
). This stores profiles/config, not large run data by default. Docs: Settings and profiles - Process worker working directory: by default the worker’s current directory unless you set
working_dir
in the work pool base job template. Temporary checkouts or run working dirs are typically created under the OS temp dir (e.g., /tmp) and cleaned up when runs complete; abrupt kills can leave leftovers. - There is no
PREFECT_LOCAL_STORAGE_PATH
in Prefect 3.x. If your flow writes results locally, that’s your code’s filesystem path (or a storage block/filesystem you configured). - Logs: Prefect 3 logs go to the API/console by default; Prefect does not create local log files unless you explicitly configure a file handler. Cleanup, by worker type Process worker (VM/bare metal) - Clear stale temp dirs older than a day (safe if no runs currently executing):
Copy code
sudo find /tmp -maxdepth 1 -name 'prefect*' -type d -mtime +1 -print -exec rm -rf {} +
  sudo find /tmp -maxdepth 1 -type f -mtime +1 -size +10M -print -delete
- Inspect and remove large artifacts your flows may have written locally (e.g., data extracts, intermediate parquet/csv)
Copy code
sudo du -xh /path/to/your/project | sort -h | tail -50
- If
~/.prefect
is on a small disk, move it to a larger volume and point Prefect at it:
Copy code
export PREFECT_HOME=/mnt/bigdisk/prefect
  mkdir -p "$PREFECT_HOME"
Then restart your worker so it picks up the new env. Docs: Settings and profiles - Prevent reoccurrence: - Set the Process work pool’s base job template
working_dir
to a larger/writable volume (in the Prefect UI: Work Pools -> your process pool -> Edit base job template -> working_dir). - Optionally set
TMPDIR
in the pool’s env to redirect temp files:
Copy code
TMPDIR=/mnt/bigdisk/tmp
Docker worker - Prune unused images/containers/volumes (use filters so you don’t break active runs):
Copy code
docker system prune -a --volumes --filter "until=168h"
  docker volume prune --filter "label!=keep"
Check what will be removed first with
--dry-run
on Docker 27+, or run without
-a
to only prune dangling images. - Limit container log growth on the host by configuring Docker log rotation (daemon.json):
Copy code
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "3"
  }
}
Then restart Docker. - In the Docker work pool base job template, set: -
working_dir
to a mounted volume with more space -
volumes
to mount a large host path for temp/data, e.g.
/mnt/bigdisk/tmp:/tmp
-
env
to set
TMPDIR=/tmp
(which you’ve mounted) - If your flows write results/artifacts locally, move them to object storage (S3, GCS, etc.) instead of the container filesystem. Kubernetes worker - Node-level cleanup:
- Ensure image garbage collection and container log rotation are configured on nodes (containerd/docker). Large container logs commonly fill disks on K8s nodes. - Remove evicted/terminated pods and orphaned PVCs/volumes not in use. - Set resource policies so pods are killed before the node fills up: - Add ephemeral-storage requests/limits to your flow pods (via the work pool’s base job template pod spec). - Use
emptyDir
with
sizeLimit
for `/tmp`/working directories, or mount a larger PVC for
working_dir
. - In the Kubernetes work pool base job template: - Set
working_dir
and mount an
emptyDir
or PVC there - Set
env
like
TMPDIR=/work/tmp
and mount that path Prefect-specific tips - There is no built-in
prefect cache clear
or similar CLI in Prefect 3.x, and we verified there’s no
prefect cache
command. - If you’re persisting results locally, prefer a remote filesystem (S3/GCS/Azure) or a mounted large-volume path. For ad-hoc local files, ensure your flow/task code deletes temporary files (or use
tempfile.TemporaryDirectory()
). - If you’re using
flow.from_source(...)
, the worker clones/exports your code per run; unexpected termination can leave temp dirs. Direct TMPDIR/working_dir to a mount you can prune safely. Optional recurring maintenance - Add a cron/systemd timer to purge ancient temp:
Copy code
# m h dom mon dow user  command
  15 3 * * * root find /tmp -maxdepth 1 -name 'prefect*' -type d -mtime +2 -exec rm -rf {} +
- For Docker hosts, schedule periodic prune with conservative filters. References - Prefect 3 settings and profiles (PREFECT_HOME): Settings and profiles - Prefect 3.2 release notes (worker/deploy improvements): Release notes - Background on infra cleanup jobs (general guidance): The role of infrastructure cleanup jobs Can you share: - Which worker type you’re using (process, docker, kubernetes)? - Where it’s running (VM, container, k8s node), and any custom `working_dir`/`TMPDIR` you’ve set? - A quick
df -h
and the top few directories from
du -xhd1 /tmp
? I can then give you the exact commands/config to fix and prevent it going forward.