<@ULVA73B9P> I have Prefect server 2.20.4 deployed...
# ask-marvin
h
@Marvin I have Prefect server 2.20.4 deployed on EKS with NLB on port 443 exposes the Prefect server service which runs on port 4200. This is my config:
Copy code
➜  ~ prefect config view
PREFECT_PROFILE='default'
PREFECT_API_URL='<https://prefect.some.com>' (from profile)
PREFECT_SERVER_API_HOST='<http://prefect.some.com|prefect.some.com>' (from profile)
PREFECT_SERVER_API_PORT='443' (from profile)
When I run any Prefect CLI command I get this:
Copy code
➜  ~ prefect work-pool ls
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/cli/_utilities.py", line 42, in wrapper
    return fn(*args, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 311, in coroutine_wrapper
    return call()
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 432, in __call__
    return self.result()
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
  File "/opt/homebrew/Cellar/python@3.10/3.10.17/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 389, in _run_async
    result = await coro
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/cli/work_pool.py", line 284, in ls
    pools = await client.read_work_pools()
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/client/orchestration.py", line 2662, in read_work_pools
    response = await <http://self._client.post|self._client.post>("/work_pools/filter", json=body)
  File "/opt/homebrew/lib/python3.10/site-packages/httpx/_client.py", line 1859, in post
    return await self.request(
  File "/opt/homebrew/lib/python3.10/site-packages/httpx/_client.py", line 1540, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/client/base.py", line 327, in send
    response = await self._send_with_retry(
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/client/base.py", line 243, in _send_with_retry
    await self._add_csrf_headers(request=request)
  File "/opt/homebrew/lib/python3.10/site-packages/prefect/client/base.py", line 393, in _add_csrf_headers
    token: CsrfToken = CsrfToken.parse_obj(token_response.json())
  File "/opt/homebrew/lib/python3.10/site-packages/httpx/_models.py", line 832, in json
    return jsonlib.loads(self.content, **kwargs)
  File "/opt/homebrew/Cellar/python@3.10/3.10.17/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/homebrew/Cellar/python@3.10/3.10.17/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/homebrew/Cellar/python@3.10/3.10.17/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
An exception occurred.
m
thought for 67.1 seconds
Based on your error and configuration, I can identify a few potential issues: 1. There's a mismatch in your URL protocol configuration: - Your PREFECT_API_URL is using
https://
- But your PREFECT_SERVER_API_HOST is showing both
http://
and
https://
(
<http://prefect.some.com|prefect.some.com>
) 2. The JSONDecodeError you're seeing typically occurs when the server isn't responding with valid JSON, which often means the connection isn't being established properly. Here are the steps to fix this: 1. First, clean up your configuration. Run these commands to set a consistent configuration:
Copy code
bash
prefect config set PREFECT_API_URL="<https://prefect.some.com>"
prefect config set PREFECT_SERVER_API_HOST="<http://prefect.some.com|prefect.some.com>"
prefect config set PREFECT_SERVER_API_PORT="443"
2. Since you're using HTTPS (port 443), ensure your environment has the proper SSL configuration:
Copy code
bash
# If you're using a custom CA certificate
export SSL_CERT_FILE=/path/to/your/cert.pem

# If you need to disable SSL verification (not recommended for production)
export PREFECT_API_DISABLE_SSL_VALIDATION=true
3. Verify your connection can reach the server:
Copy code
bash
# Test the connection
curl -v <https://prefect.some.com/api/health>

# If you need to skip SSL verification for testing
curl -v -k <https://prefect.some.com/api/health>
4. If you're behind a corporate proxy, make sure to set:
Copy code
bash
export HTTPS_PROXY="your-proxy-url"
5. Double-check that your NLB is properly configured: - Ensure the health checks are configured correctly - Verify the target group is pointing to port 4200 on your EKS pods - Confirm the security groups allow traffic from your client to the NLB on port 443 If these steps don't resolve the issue, please try: 1. Running
curl
against your endpoint to see the raw response 2. Checking the Prefect server logs in your EKS cluster for any errors 3. Verifying that the NLB's target group health checks are passing Could you please: 1. Try the configuration changes above and let me know if they help 2. Share the output of
curl -v <https://prefect.some.com/api/health>
(with any sensitive information redacted) 3. Check if there are any relevant errors in your Prefect server logs? This will help us narrow down whether this is a network configuration issue, an SSL issue, or a server configuration problem.
h
@Marvin 1. The configuration change didn’t help. 2. The response of the
curl
command:
Copy code
~ curl -v <https://prefect.some.com/api/health>
* Host <http://prefect.some.com:443|prefect.some.com:443> was resolved.
* IPv6: (none)
* IPv4: 10.15.14.138
*   Trying 10.15.14.138:443...
* Connected to <http://prefect.some.com|prefect.some.com> (10.15.14.138) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=*.<http://some.com|some.com>
*  start date: Jun 16 00:00:00 2025 GMT
*  expire date: Jul 15 23:59:59 2026 GMT
*  subjectAltName: host "<http://prefect.some.com|prefect.some.com>" matched cert's "*.<http://some.com|some.com>"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M02
*  SSL certificate verify ok.
* using HTTP/1.x
> GET /api/health HTTP/1.1
> Host: <http://prefect.some.com|prefect.some.com>
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< date: Thu, 24 Jul 2025 09:33:52 GMT
< server: uvicorn
< content-length: 4
< content-type: application/json
<
* Connection #0 to host <http://prefect.some.com|prefect.some.com> left intact
true%
3. No error in EKS
m
thought for 14.5 seconds
Let's try to narrow down the issue. The error occurs when trying to get a CSRF token, which happens before the actual API call. Let's test the CSRF endpoint specifically:
Copy code
bash
curl -v <https://prefect.some.com/api/csrf>
Could you run this command and share the output? Also, since you're using an NLB, there are a few things we should verify: 1. Is your NLB configured with TLS termination, or is it just passing through the TLS traffic? 2. In your Kubernetes service configuration for Prefect, could you confirm: - The service type (ClusterIP/LoadBalancer) - The port mappings - Any annotations Could you share your Kubernetes service configuration (with any sensitive information redacted)? The fact that the health check works but other endpoints fail suggests one of these possibilities: 1. The NLB might be incorrectly handling certain paths 2. There might be a mismatch in how TLS is being terminated and forwarded 3. There could be an issue with how headers are being passed through the NLB While we wait for those details, let's try one more thing. Could you temporarily modify your configuration to test with HTTP instead of HTTPS (just for troubleshooting):
Copy code
bash
prefect config set PREFECT_API_URL="<http://prefect.some.com:443>"
prefect config set PREFECT_SERVER_API_HOST="<http://prefect.some.com|prefect.some.com>"
prefect config set PREFECT_SERVER_API_PORT="443"
Then try running
prefect work-pool ls
again. This will help us determine if the issue is specific to HTTPS handling. Also, could you share your Prefect server deployment configuration (particularly any environment variables set for the server)?
h
@Marvin
Copy code
~ curl -v <https://prefect.some.com/api/csrf>
* Host <http://prefect.some.com:443|prefect.some.com:443> was resolved.
* IPv6: (none)
* IPv4: 10.15.14.138
*   Trying 10.15.14.138:443...
* Connected to <http://prefect.some.com|prefect.some.com> (10.15.14.138) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=*.<http://some.com|some.com>
*  start date: Jun 16 00:00:00 2025 GMT
*  expire date: Jul 15 23:59:59 2026 GMT
*  subjectAltName: host "<http://prefect.some.com|prefect.some.com>" matched cert's "*.<http://some.com|some.com>"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M02
*  SSL certificate verify ok.
* using HTTP/1.x
> GET /api/csrf HTTP/1.1
> Host: <http://prefect.some.com|prefect.some.com>
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 404 Not Found
< date: Thu, 24 Jul 2025 09:38:59 GMT
< server: uvicorn
< content-length: 22
< content-type: application/json
<
* Connection #0 to host <http://prefect.some.com|prefect.some.com> left intact
{"detail":"Not Found"}%
m
thought for 40.9 seconds
Based on this research and your error, I have a few suggestions: 1. First, let's check if CSRF protection is enabled in your server configuration. Could you check your Prefect server deployment configuration for these environment variables:
Copy code
PREFECT_SERVER_CSRF_PROTECTION_ENABLED
2. Try setting this explicitly in your server deployment:
Copy code
yaml
   env:
     - name: PREFECT_SERVER_CSRF_PROTECTION_ENABLED
       value: "false"
3. Another approach would be to modify your client configuration to disable CSRF validation temporarily (for testing):
Copy code
bash
   export PREFECT_CLIENT_CSRF_VALIDATION_ENABLED=false
Then try your command again:
Copy code
bash
   prefect work-pool ls
Could you also share: 1. Your Kubernetes service configuration for the Prefect server 2. Is the NLB doing TLS termination, or is it just passing through the TLS traffic? 3. The result of trying the command with CSRF validation disabled as suggested above This will help us determine if: a) The CSRF configuration is the root cause b) There's a routing issue with the NLB c) There's a mismatch in how the server is configured versus how it's being accessed Also, just to verify - when you access the Prefect UI through your browser at
<https://prefect.some.com>
, does it work correctly?
h
@Marvin In prefect server:
Copy code
PREFECT_SERVER_CSRF_PROTECTION_ENABLED: false
In my client I ran:
Copy code
export PREFECT_CLIENT_CSRF_VALIDATION_ENABLED=false
but still got the same error. The NLP is doing TLS termination. The traffic is https from client to NLB and from NLB to Prefect server it’s http on port 4200
m
thought for 13.5 seconds
Ah, now we're getting somewhere! The fact that there's TLS termination at the NLB and internal HTTP communication is important. Let's fix this configuration mismatch. Since the NLB is terminating TLS and forwarding HTTP traffic internally to port 4200, we need to adjust your Prefect client configuration to match this setup. The current configuration is trying to use HTTPS all the way through, which isn't correct for your architecture. Try these configuration changes:
Copy code
bash
# Clear existing configuration first
prefect config unset PREFECT_API_URL
prefect config unset PREFECT_SERVER_API_HOST
prefect config unset PREFECT_SERVER_API_PORT

# Set new configuration
prefect config set PREFECT_API_URL="<https://prefect.some.com>"
prefect config set PREFECT_SERVER_API_HOST="0.0.0.0"
prefect config set PREFECT_SERVER_API_PORT="4200"

# Additional settings that might help
export PREFECT_CLIENT_CSRF_VALIDATION_ENABLED=false
export PREFECT_API_DISABLE_SSL_VALIDATION=true
Also, could you share your Kubernetes service configuration? I want to make sure the service is properly configured to handle the forwarded traffic from the NLB. The expected configuration should look something like this:
Copy code
yaml
apiVersion: v1
kind: Service
metadata:
  name: prefect-server
  annotations:
    <http://service.beta.kubernetes.io/aws-load-balancer-type|service.beta.kubernetes.io/aws-load-balancer-type>: nlb
    <http://service.beta.kubernetes.io/aws-load-balancer-ssl-cert|service.beta.kubernetes.io/aws-load-balancer-ssl-cert>: "your-cert-arn"
    <http://service.beta.kubernetes.io/aws-load-balancer-ssl-ports|service.beta.kubernetes.io/aws-load-balancer-ssl-ports>: "443"
spec:
  ports:
    - port: 443
      targetPort: 4200
      protocol: TCP
  type: LoadBalancer
Could you also try this quick test to verify the internal HTTP communication is working:
Copy code
bash
# Test the internal HTTP endpoint (if you have access to the cluster network)
curl -v <http://prefect-server-service:4200/api/health>
Additionally, could you check your Prefect server logs after attempting to run a command? Even though you mentioned there are no errors, there might be some relevant INFO or DEBUG level messages that could help us understand what's happening when the requests come in. Let me know: 1. If the new configuration changes help 2. Your Kubernetes service configuration (if different from what I showed above) 3. Any relevant log entries from the Prefect server when you try to run commands
h
@Marvin I set the config as you said, but I get the same error. My service config:
Copy code
apiVersion: v1
kind: Service
metadata:
  annotations:        
    <http://service.beta.kubernetes.io/aws-load-balancer-name|service.beta.kubernetes.io/aws-load-balancer-name>: <load-balancer-name>
    <http://service.beta.kubernetes.io/aws-load-balancer-nlb-target-type|service.beta.kubernetes.io/aws-load-balancer-nlb-target-type>: ip
    <http://service.beta.kubernetes.io/aws-load-balancer-scheme|service.beta.kubernetes.io/aws-load-balancer-scheme>: internal
    <http://service.beta.kubernetes.io/aws-load-balancer-ssl-cert|service.beta.kubernetes.io/aws-load-balancer-ssl-cert>: <cert-arn>
    <http://service.beta.kubernetes.io/aws-load-balancer-ssl-ports|service.beta.kubernetes.io/aws-load-balancer-ssl-ports>: "443"
    <http://service.beta.kubernetes.io/aws-load-balancer-type|service.beta.kubernetes.io/aws-load-balancer-type>: nlb
  labels:
    <http://app.kubernetes.io/component|app.kubernetes.io/component>: server
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: prefect-server
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: prefect-server
    <http://app.kubernetes.io/version|app.kubernetes.io/version>: 2.20.4  
    prefect-version: 2.20.4
  name: prefect-server
  namespace: prefect-server
spec:
  loadBalancerClass: service.k8s.aws/nlb
  ports:
  - name: server-svc-port
    nodePort: 31440
    port: 443
    protocol: TCP
    targetPort: 4200
m
thought for 21.0 seconds
I don't see any specific issues related to API connectivity in 2.20.4. Let's try one more approach: 1. First, let's verify the exact traffic flow. Could you run this curl command and share the output:
Copy code
bash
curl -v -H "Accept: application/json" -H "X-Forwarded-Proto: https" <https://prefect.some.com/api/work_pools/filter> -d '{}'
2. Also, let's check your Prefect server deployment configuration. Could you share: - The relevant parts of your Prefect server deployment YAML (particularly the environment variables) - The output of
kubectl describe pod <prefect-server-pod-name> -n prefect-server
3. Try setting these additional environment variables in your client environment:
Copy code
bash
export PREFECT_CLIENT_CONNECT_TIMEOUT=30
export PREFECT_CLIENT_RETRY_EXTRA_CODES="404"
The fact that the health check endpoint works but other endpoints don't suggests this might be related to how the NLB is handling the path routing or how the Prefect server is configured to handle its base URL. Could you also confirm: 1. Are you using any ingress controller in addition to the NLB? 2. What's the value of
PREFECT_SERVER_API_BASE_URL
in your server deployment (if set)?