Scott Pedersen
07/23/2025, 12:48 AMMarvin
07/23/2025, 12:48 AMMarvin
07/23/2025, 12:49 AMPod Watch Timeout Seconds
parameter. This setting determines how long Prefect will wait for a pod to start before considering it failed.
Here's how you can configure it:
1. When creating your work pool, you can set this in the base job template. For example:
json
{
"job_configuration": {
"pod_watch_timeout_seconds": 300 // 5 minutes
}
}
The default value is typically around 60 seconds (1 minute), which is why you're seeing the timeout. Setting it to a higher value like 300 seconds (5 minutes) gives your pods more time to start up, which is especially useful when:
- Using serverless node pools that need time to spin up
- Working with large container images that take longer to pull
- Dealing with resource constraints that might delay pod scheduling
This setting is particularly important when:
- Your cluster is using auto-scaling node groups
- You're running on serverless Kubernetes offerings
- Your pods require significant initialization time
You can find more details about Kubernetes configuration in the Prefect documentation.
Would you like me to provide more specific guidance on how to update this setting for your particular setup?Scott Pedersen
07/23/2025, 12:49 AMpod_watch_timeout_seconds
available in Prefect 2.20.16?Marvin
07/23/2025, 12:49 AMMarvin
07/23/2025, 12:50 AMpod_watch_timeout_seconds
is available in Prefect 2.20.16. You can configure it in two ways:
1. In your Kubernetes work pool's base job template:
json
{
"job_configuration": {
"pod_watch_timeout_seconds": 300 // 5 minutes
}
}
2. Or when creating a KubernetesJob directly in code:
python
from prefect.infrastructure import KubernetesJob
kubernetes_job = KubernetesJob(
pod_watch_timeout_seconds=300
)
The default value is 60 seconds, so increasing it to 300 seconds (5 minutes) or more should help with pods that take longer to start up.
This setting has been available in Prefect 2.x for quite a while and is definitely present in version 2.20.16. It serves the same purpose as in Prefect 3.x - it determines how long Prefect will wait for a pod to start before considering it failed.
Would you like me to provide more specific guidance on how to implement this in your particular setup?Scott Pedersen
07/23/2025, 1:28 AMMarvin
07/23/2025, 1:28 AMMarvin
07/23/2025, 1:31 AMjson
{
"variables": {
"type": "object",
"properties": {
"image": {
"type": "string",
"title": "Image",
"description": "The image to use for the job"
},
"namespace": {
"type": "string",
"title": "Namespace",
"default": "default",
"description": "The Kubernetes namespace to create jobs in"
}
},
"required": ["image"]
},
"job_configuration": {
"job_manifest": {
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"namespace": "{{ namespace }}"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "prefect-job",
"image": "{{ image }}",
"resources": {
"requests": {
"cpu": "100m",
"memory": "256Mi"
},
"limits": {
"memory": "512Mi"
}
}
}
],
"restartPolicy": "Never"
}
},
"backoffLimit": 0
}
},
"pod_watch_timeout_seconds": 60,
"job_watch_timeout_seconds": 3600,
"stream_output": true
}
}
Key points about the default template:
1. Default timeouts:
- `pod_watch_timeout_seconds`: 60 seconds (1 minute)
- `job_watch_timeout_seconds`: 3600 seconds (1 hour)
2. Default resource requests:
- CPU: 100m (0.1 CPU cores)
- Memory Request: 256Mi
- Memory Limit: 512Mi
3. Important default settings:
- `restartPolicy`: "Never"
- `backoffLimit`: 0 (job will not be retried if it fails)
- `stream_output`: true (logs will be streamed to Prefect)
You can customize this template when creating a work pool either through the UI or using the CLI with a JSON file. For example, to increase the pod watch timeout, you would modify the pod_watch_timeout_seconds
value in the job_configuration
section.
Would you like me to explain how to modify any specific part of this template for your use case?Scott Pedersen
07/23/2025, 1:54 AMMarvin
07/23/2025, 1:54 AMMarvin
07/23/2025, 1:55 AMyaml
apiVersion: batch/v1
kind: Job
metadata:
namespace: "{{ namespace }}"
spec:
template:
spec:
containers:
- name: prefect-job
image: "{{ image }}"
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
memory: "512Mi"
restartPolicy: Never
backoffLimit: 0
This is the core job manifest that Prefect uses by default. You can customize it by adding additional fields that Kubernetes jobs support, such as:
yaml
apiVersion: batch/v1
kind: Job
metadata:
namespace: "{{ namespace }}"
labels:
app: prefect-job
spec:
template:
metadata:
labels:
app: prefect-job
spec:
containers:
- name: prefect-job
image: "{{ image }}"
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
memory: "512Mi"
env:
- name: PREFECT_API_KEY
valueFrom:
secretKeyRef:
name: prefect-api-key
key: key
volumeMounts:
- name: my-volume
mountPath: /data
volumes:
- name: my-volume
emptyDir: {}
imagePullSecrets:
- name: my-registry-secret
restartPolicy: Never
serviceAccountName: prefect-service-account
backoffLimit: 0
Common customizations include:
- Adding environment variables
- Mounting volumes
- Setting image pull secrets
- Configuring service accounts
- Adding node selectors or tolerations
- Setting security contexts
- Adding labels and annotations
Would you like me to elaborate on any specific customization aspect?Scott Pedersen
07/23/2025, 1:56 AMpod_watch_timeout_seconds
, and job_watch_timeout_seconds
?Marvin
07/23/2025, 1:56 AMMarvin
07/23/2025, 1:56 AMpod_watch_timeout_seconds
and job_watch_timeout_seconds
are not part of the Kubernetes job manifest itself - they are Prefect-specific configurations that go in the outer job_configuration
section of the work pool's base job template.
Here's the complete structure showing where these timeout settings belong:
yaml
variables:
type: object
properties:
image:
type: string
title: Image
description: The image to use for the job
namespace:
type: string
title: Namespace
default: default
description: The Kubernetes namespace to create jobs in
required:
- image
job_configuration:
# Prefect-specific settings go here at this level
pod_watch_timeout_seconds: 300 # Example: increased to 5 minutes
job_watch_timeout_seconds: 3600 # Default: 1 hour
stream_output: true
# The Kubernetes job manifest goes under job_manifest
job_manifest:
apiVersion: batch/v1
kind: Job
metadata:
namespace: "{{ namespace }}"
spec:
template:
spec:
containers:
- name: prefect-job
image: "{{ image }}"
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
memory: "512Mi"
restartPolicy: Never
backoffLimit: 0
Key points:
1. These timeout settings are Prefect configurations, not Kubernetes configurations
2. They should be at the same level as job_manifest
within the job_configuration
section
3. They are not part of the Kubernetes YAML specification
When creating or updating a work pool, you would save this entire configuration (including both the Prefect-specific settings and the Kubernetes job manifest) as a single YAML file.