Devih
08/03/2025, 3:53 PMMarvin
08/03/2025, 3:53 PMMarvin
08/03/2025, 3:54 PMconcurrency_limit=100
on the deployment, the actual number of concurrent runs is also limited by your worker configuration.
- How many workers do you have running in your EKS cluster?
- What is the work pool type you're using? (Kubernetes, Docker, etc.)
2. Resource Constraints:
- EKS cluster capacity (CPU/memory) could be limiting the number of concurrent pods that can be scheduled
- Worker pod resources and limits could be affecting how many concurrent runs can be managed
3. Work Pool Configuration:
- Work pool concurrency settings might be overriding or affecting your deployment concurrency limit
- You might need to check both the work pool and work queue concurrency settings
To troubleshoot this, I recommend:
1. Check your work pool configuration:
bash
prefect work-pool inspect YOUR_WORK_POOL_NAME
2. Verify your worker logs to see if there are any resource constraints or errors being reported
3. Check your EKS cluster metrics:
- Available CPU/memory
- Pod scheduling status
- Any pod pending states
4. Verify that your flow runs are being assigned to the correct work pool and queue
Could you please provide:
1. What type of work pool are you using?
2. How many workers do you have running?
3. Are you seeing any error messages in the worker logs?
4. Are the "scheduled" flow runs assigned to a specific work queue?
This information would help me provide more specific guidance for your setup. Also, you mentioned the runs don't show up in the UI - are they visible in the database but not in the UI, or are they not being picked up by workers at all?
Remember that while the deployment concurrency limit sets a maximum, the actual number of concurrent runs is determined by the minimum of several factors:
- Deployment concurrency limit (100 in your case)
- Work pool/queue concurrency limits
- Worker capacity
- Infrastructure resources
- Global concurrency limits (if any are set)Devih
08/03/2025, 3:57 PMMarvin
08/03/2025, 3:57 PMMarvin
08/03/2025, 3:58 PM--limit
parameter.
- Try running this command to see your current workers' status:
bash
prefect worker ls
2. Work Pool Settings
Let's check your work pool configuration:
bash
prefect work-pool inspect YOUR_WORK_POOL_NAME
Pay attention to:
- The work pool concurrency limit
- Any queue-specific concurrency limits
- The worker type (Kubernetes, Process, etc.)
3. Worker Process Limits
Since you have two workers, make sure each worker is configured to handle enough concurrent flow runs. The default might be lower than you expect. You can modify this by:
- Updating the worker deployment YAML to include appropriate concurrency settings
- Setting the appropriate process limits in your worker configuration
4. Troubleshooting Steps:
1. Check the worker logs for any warnings or errors about concurrency or resource limits
2. Verify that flow runs are being distributed between both workers
3. Monitor the worker processes to ensure they're healthy and not restarting
Could you please provide:
1. The output of prefect work-pool inspect YOUR_WORK_POOL_NAME
?
2. Are you using a Kubernetes work pool type?
3. How are your workers deployed (deployment YAML configuration)?
4. Are you seeing any specific patterns in when/how the flow runs get picked up?
Also, regarding the UI issue where scheduled runs don't show up - this might be related to the UI's default filtering. Try:
1. Changing the filter in the UI to show "Scheduled" status
2. Checking if the runs appear in the database with:
python
from prefect.client import get_client
async with get_client() as client:
flows = await client.read_flows()
The fact that you're seeing varying numbers (4-50) suggests this might be related to how the workers are processing the runs rather than a hard limit. Let's identify the exact configuration once you provide the above details, and we can then make specific recommendations for your setup.Devih
08/03/2025, 4:02 PMMarvin
08/03/2025, 4:02 PMMarvin
08/03/2025, 4:02 PMprefect work-pool ls
2. The output of prefect work-pool inspect YOUR_POOL_NAME
3. Your worker Kubernetes deployment YAML
Once you share these details, we can:
- Verify the work pool concurrency settings
- Check the worker configuration
- Identify any potential bottlenecks
- Provide specific recommendations for your setup
Also, when you say the jobs are in "scheduled" status but not showing in the UI, can you confirm:
- Are these runs visible if you explicitly filter for "Scheduled" status in the UI?
- Are you able to see these runs when querying the database directly?
- Do you see any error messages in the worker logs?Devih
08/03/2025, 4:04 PMfrom_file
for creation.',
'secret_fields': [],
'block_type_slug': 'kubernetes-cluster-config',
'block_schema_references': {}
}
},
'description': 'Default variables for the Kubernetes worker.\nThe schema for this class is used to populate the variables
section of the default base job template.'
},
'job_configuration': {
'env': '{{ env }}',
'name': '{{ name }}',
'labels': '{{ labels }}',
'command': '{{ command }}',
'namespace': '{{ namespace }}',
'job_manifest': {
'kind': 'Job',
'spec': {
'template': {
'spec': {
'containers': [
{
'env': [
{'name': 'POD_IP', 'valueFrom': {'fieldRef': {'fieldPath': 'status.podIP'}}},
{'name': 'POD_NAME', 'valueFrom': {'fieldRef': {'fieldPath': 'metadata.name'}}},
{'name': 'POD_NAMESPACE', 'valueFrom': {'fieldRef': {'fieldPath': 'metadata.namespace'}}}
],
'args': '{{ command }}',
'name': 'prefect-job',
'image': '{{ image }}',
'resources': {
'limits': {'cpu': '{{ k8s_resource_cpu }}', 'memory': '{{ k8s_resource_memory }}'},
'requests': {'cpu': '{{ k8s_resource_cpu }}', 'memory': '{{ k8s_resource_memory }}'}
},
'imagePullPolicy': '{{ image_pull_policy }}'
}
],
'completions': 1,
'parallelism': 1,
'tolerations': [{'key': 'node-pool', 'value': '{{ node_pool }}', 'effect': 'NoSchedule', 'operator': 'Equal'}],
'nodeSelector': {'appname': '{{ node_pool }}', 'kubernetes.io/arch': 'amd64', 'karpenter.sh/capacity-type': '{{ capacity_type }}'},
'restartPolicy': 'Never',
'serviceAccountName': '{{ service_account_name }}'
}
},
'backoffLimit': 0,
'podFailurePolicy': {
'rules': [
{'action': 'Ignore', 'onPodConditions': [{'type': 'DisruptionTarget'}]},
{'action': 'FailJob', 'onExitCodes': {'values': [0], 'operator': 'NotIn', 'containerName': 'prefect-job'}}
]
},
'ttlSecondsAfterFinished': '{{ finished_job_ttl }}'
},
'metadata': {'labels': '{{ labels }}', 'namespace': '{{ namespace }}', 'generateName': '{{ name }}-'},
'apiVersion': 'batch/v1'
},
'stream_output': '{{ stream_output }}',
'cluster_config': '{{ cluster_config }}',
'job_watch_timeout_seconds': '{{ job_watch_timeout_seconds }}',
'pod_watch_timeout_seconds': '{{ pod_watch_timeout_seconds }}'
}
},
status=WorkPoolStatus.READY,
default_queue_id='5b0c4ebb-7080-4e74-846b-d49c504b9d81'
)Marvin
08/03/2025, 4:04 PMDevih
08/03/2025, 4:04 PMMarvin
08/03/2025, 4:04 PMMarvin
08/03/2025, 4:04 PMbash
prefect work-pool ls
2. And the output of:
bash
prefect work-pool inspect YOUR_POOL_NAME
This information will help us understand:
- Your work pool type (Kubernetes, Process, etc.)
- Any concurrency limits set at the work pool level
- Queue configurations
- Worker settings
Also, it would be helpful to see:
1. Your worker Kubernetes deployment YAML
2. The configuration you use to set the concurrency_limit=100 (is it in the deployment configuration or set somewhere else?)
With this information, we can better diagnose why you're only seeing 4-50 concurrent jobs instead of the expected 100, despite having ample CPU resources available.Devih
08/03/2025, 4:05 PMfrom_file
for creation.',
'secret_fields': [],
'block_type_slug': 'kubernetes-cluster-config',
'block_schema_references': {}
}
},
'description': 'Default variables for the Kubernetes worker.\nThe schema for this class is used to populate the variables
section of the default base job template.'
},
'job_configuration': {
'env': '{{ env }}',
'name': '{{ name }}',
'labels': '{{ labels }}',
'command': '{{ command }}',
'namespace': '{{ namespace }}',
'job_manifest': {
'kind': 'Job',
'spec': {
'template': {
'spec': {
'containers': [
{
'env': [
{'name': 'POD_IP', 'valueFrom': {'fieldRef': {'fieldPath': 'status.podIP'}}},
{'name': 'POD_NAME', 'valueFrom': {'fieldRef': {'fieldPath': 'metadata.name'}}},
{'name': 'POD_NAMESPACE', 'valueFrom': {'fieldRef': {'fieldPath': 'metadata.namespace'}}}
],
'args': '{{ command }}',
'name': 'prefect-job',
'image': '{{ image }}',
'resources': {
'limits': {'cpu': '{{ k8s_resource_cpu }}', 'memory': '{{ k8s_resource_memory }}'},
'requests': {'cpu': '{{ k8s_resource_cpu }}', 'memory': '{{ k8s_resource_memory }}'}
},
'imagePullPolicy': '{{ image_pull_policy }}'
}
],
'completions': 1,
'parallelism': 1,
'tolerations': [{'key': 'node-pool', 'value': '{{ node_pool }}', 'effect': 'NoSchedule', 'operator': 'Equal'}],
'nodeSelector': {'appname': '{{ node_pool }}', 'kubernetes.io/arch': 'amd64', 'karpenter.sh/capacity-type': '{{ capacity_type }}'},
'restartPolicy': 'Never',
'serviceAccountName': '{{ service_account_name }}'
}
},
'backoffLimit': 0,
'podFailurePolicy': {
'rules': [
{'action': 'Ignore', 'onPodConditions': [{'type': 'DisruptionTarget'}]},
{'action': 'FailJob', 'onExitCodes': {'values': [0], 'operator': 'NotIn', 'containerName': 'prefect-job'}}
]
},
'ttlSecondsAfterFinished': '{{ finished_job_ttl }}'
},
'metadata': {'labels': '{{ labels }}', 'namespace': '{{ namespace }}', 'generateName': '{{ name }}-'},
'apiVersion': 'batch/v1'
},
'stream_output': '{{ stream_output }}',
'cluster_config': '{{ cluster_config }}',
'job_watch_timeout_seconds': '{{ job_watch_timeout_seconds }}',
'pod_watch_timeout_seconds': '{{ pod_watch_timeout_seconds }}'
}
},
status=WorkPoolStatus.READY,
default_queue_id='5b0c4ebb-7080-4e74-846b-d49c504b9d81'
)Marvin
08/03/2025, 4:05 PMDevih
08/03/2025, 4:06 PMMarvin
08/03/2025, 4:06 PMMarvin
08/03/2025, 4:07 PMbash
export PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS=1000
export PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS=1000
2. In Kubernetes Deployment
If you're running Prefect server in Kubernetes, add these to your deployment environment variables:
yaml
env:
- name: PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS
value: "1000"
- name: PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS
value: "1000"
3. Using a Prefect Profile
Create or modify your profile in `~/.prefect/profiles.toml`:
toml
[profiles.default]
PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS=1000
PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS=1000
Important Notes:
1. These settings affect how many runs are auto-scheduled in advance
2. You'll need to restart your Prefect server after changing these values
3. Make sure to set both variables to maintain consistency
4. Consider your database capacity when setting higher values
To verify the changes have taken effect, you can:
1. Check your server logs for any configuration-related messages
2. Monitor the number of scheduled runs in the UI
3. Query the database to verify the number of scheduled runs
Would you like me to help you with:
1. The specific configuration for your deployment method (Kubernetes, Docker, etc.)?
2. Verifying the changes have taken effect?
3. Understanding any related settings that might need adjustment?
Please let me know which deployment method you're using and if you need any specific configuration examples.Devih
08/03/2025, 4:09 PMMarvin
08/03/2025, 4:09 PMMarvin
08/03/2025, 4:10 PMvalues.yaml
where you're trying to set these variables?
2. The command you used to install/upgrade the Helm chart?
Here are a few ways to properly set these environment variables in your Helm deployment:
1. Using values.yaml:
yaml
server:
env:
- name: PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS
value: "1000"
- name: PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS
value: "1000"
2. Using Helm --set:
bash
helm upgrade --install prefect prefecthq/prefect-server \
--set server.env[0].name=PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS,server.env[0].value="1000" \
--set server.env[1].name=PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS,server.env[1].value="1000"
3. Using a ConfigMap:
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prefect-server-config
data:
PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS: "1000"
PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS: "1000"
---
# In your values.yaml
server:
envFrom:
- configMapRef:
name: prefect-server-config
To verify the configuration:
1. Check if the environment variables are present in the pod:
bash
kubectl exec -it <prefect-server-pod-name> -- env | grep PREFECT
2. Check the server logs for any configuration-related messages:
bash
kubectl logs <prefect-server-pod-name>
3. Verify the pod specification:
bash
kubectl get pod <prefect-server-pod-name> -o yaml
Could you please:
1. Share your current Helm values configuration?
2. Confirm which method you're using to set these variables?
3. Share any error messages or unexpected behavior you're seeing?
This will help me provide more specific guidance for your setup.Devih
08/03/2025, 4:14 PMMarvin
08/03/2025, 4:14 PMMarvin
08/03/2025, 4:15 PMDevih
08/03/2025, 4:18 PMMarvin
08/03/2025, 4:18 PMMarvin
08/03/2025, 4:19 PMbash
prefect work-pool set-concurrency-limit POOL_NAME 100
2. Work Queue Concurrency (if using multiple queues)
bash
prefect work-queue set-concurrency-limit QUEUE_NAME 100
3. Worker Configuration
In your Kubernetes worker deployment, ensure proper resources and limits:
yaml
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
To troubleshoot your current situation where you're only seeing 4-50 concurrent runs instead of 100, please share:
1. Your work pool configuration:
bash
prefect work-pool inspect YOUR_POOL_NAME
2. Your worker Kubernetes deployment YAML
3. Are you seeing any specific errors in:
- Worker logs
- Prefect server logs
- Kubernetes events (kubectl get events
)
The scheduler-related environment variables you asked about (PREFECT_API_SERVICES_SCHEDULER_MAX_RUNS
and PREFECT_SERVER_SERVICES_SCHEDULER_MAX_RUNS
) won't help with your concurrency issues since they only affect scheduled (not event-based) flows. Let's focus on the settings that will actually impact your concurrent execution capacity.