Piotr
07/02/2025, 9:44 AMMarvin
07/02/2025, 9:44 AMMarvin
07/02/2025, 9:45 AMyaml
# kubernetes-worker.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prefect-worker
spec:
template:
spec:
containers:
- name: prefect-worker
volumeMounts:
- name: flow-storage
mountPath: /opt/prefect/flows # Choose your desired mount path
volumes:
- name: flow-storage
persistentVolumeClaim:
claimName: flow-storage-pvc
---
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: flow-storage-pvc
spec:
accessModes:
- ReadWriteMany # If multiple workers need access
resources:
requests:
storage: 10Gi # Adjust size as needed
4. Accessing Files in Your Flow:
python
from prefect import flow
import pandas as pd
@flow
def process_local_file():
# This path refers to the mounted PVC location in the Pod
df = pd.read_csv("/opt/prefect/flows/my_data.csv")
# Process data...
return df
5. Best Practices:
- Always use absolute paths when referencing files
- Ensure your PVC has the appropriate access mode (ReadWriteMany if multiple workers need access)
- Consider using environment variables for file paths to make your flows more portable
- Make sure the mounted directory has appropriate permissions for the user running the Prefect worker
6. Important Considerations:
- The PVC must be available in the namespace where your Prefect worker runs
- If you're using multiple worker replicas, ensure your storage class supports ReadWriteMany access mode
- Consider using init containers if you need to pre-populate or prepare the persistent storage
Remember that while LocalFileSystem blocks are available, in a Kubernetes environment, it's better to handle persistence through Kubernetes native concepts (PVCs) and then access those mounted volumes directly in your flows using standard Python file operations.
This approach ensures your data persists across Pod restarts and can be shared between different flows if needed. The paths you use in your flows will be the paths where you mounted the PVC in your Pod configuration.Piotr
07/02/2025, 10:50 AMMarvin
07/02/2025, 10:50 AMMarvin
07/02/2025, 10:51 AMvalues.yaml
for the worker Helm chart:
yaml
worker:
volumes:
- name: shared-workspace
persistentVolumeClaim:
claimName: prefect-shared-pvc # Reference existing PVC or create new
volumeMounts:
- name: shared-workspace
mountPath: /opt/prefect/shared # Mount path in worker pod
# Important: Configure the same volume for job pods
job_template:
spec:
template:
spec:
volumes:
- name: shared-workspace
persistentVolumeClaim:
claimName: prefect-shared-pvc
containers:
- name: flow
volumeMounts:
- name: shared-workspace
mountPath: /opt/prefect/shared # Same mount path as worker
2. Create the PVC (if not existing):
yaml
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prefect-shared-pvc
spec:
accessModes:
- ReadWriteMany # Important for multiple pods access
resources:
requests:
storage: 10Gi
3. Storage Flow Between Pods:
- The Worker Pod and Job Pods will both have access to the same storage at /opt/prefect/shared
- Files written by the worker will be visible to jobs and vice versa
- The storage persists even if pods are restarted
4. Using the Shared Storage in Flows:
python
from prefect import flow
import os
@flow
def process_data():
shared_path = "/opt/prefect/shared"
# Write data from job pod
with open(f"{shared_path}/output.txt", "w") as f:
f.write("Hello from flow!")
# This file will persist and be accessible to other jobs
return True
5. Important Considerations:
- Ensure your storage class supports ReadWriteMany
if you need multiple pods to access simultaneously
- The mount paths must be consistent between worker and job pods
- The PVC must be in the same namespace as your worker and jobs
- Consider setting appropriate file permissions in the shared volume
6. Deployment Command:
bash
helm upgrade --install prefect-worker prefect/worker \
-f values.yaml \
--namespace your-namespace
7. Debugging Tips:
- Check that both worker and job pods can access the PVC:
bash
kubectl describe pod <worker-pod-name>
kubectl describe pod <job-pod-name>
- Verify the PVC is bound:
bash
kubectl get pvc prefect-shared-pvc
- Check storage permissions if you encounter access issues
Remember that any files written to the mounted path (/opt/prefect/shared
in the examples) will be persisted in the PVC and available to all pods that mount it. This is particularly useful for:
- Sharing input data between flows
- Persisting output from flows
- Maintaining state between flow runs
- Sharing configuration files
The key is ensuring that both the worker and job templates in your Helm chart values reference the same PVC and use consistent mount paths. This creates a unified storage space that all your Prefect-related pods can access.Piotr
07/02/2025, 11:07 AMMarvin
07/02/2025, 11:07 AMMarvin
07/02/2025, 11:08 AMjson
{
"job_configuration": {
"job_manifest": {
"apiVersion": "batch/v1",
"kind": "Job",
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "prefect-job",
"volumeMounts": [
{
"name": "shared-workspace",
"mountPath": "/opt/prefect/shared"
}
]
}
],
"volumes": [
{
"name": "shared-workspace",
"persistentVolumeClaim": {
"claimName": "prefect-shared-pvc"
}
}
]
}
}
}
}
}
}
2. Configure Worker Values:
In your `values.yaml`:
yaml
worker:
volumes:
- name: shared-workspace
persistentVolumeClaim:
claimName: prefect-shared-pvc
volumeMounts:
- name: shared-workspace
mountPath: /opt/prefect/shared
3. Deploy the Worker:
bash
helm install prefect-worker prefect/prefect-worker \
--namespace your-namespace \
-f values.yaml \
--set-file worker.config.baseJobTemplate.configuration=base-job-template.json
Important notes:
- The base job template is how you configure the job pods that run your flows
- The worker needs the volumes configured separately as it's a different pod
- Make sure to create the PVC before deploying the worker
- The mount paths should be consistent between worker and job pods
- The PVC must be in the same namespace as your worker and jobs
If you're seeing issues with permissions, make sure your service account has the necessary RBAC permissions:
yaml
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
name: prefect-worker
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
And bind it to your service account:
yaml
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
name: prefect-worker
subjects:
- kind: ServiceAccount
name: prefect-worker
roleRef:
kind: Role
name: prefect-worker
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
This configuration ensures that:
1. The worker pod can access the PVC
2. All flow job pods will have access to the same PVC
3. The storage is persistent across pod restarts
4. Both worker and jobs can read/write to the same location