<@ULVA73B9P> when i split prefect services (using ...
# ask-marvin
j
@Marvin when i split prefect services (using redis as a messenger) and prefect API into separate containers, my tasks stop showing up on the UI (but they run smoothly). i see no related logs (error or otherwise) on either the api or service containers. i'm loosly following [this](https://github.com/zzstoatzz/prefect-pack/blob/4cfc53ae94c6bd65ff0eefcb1c5ce72b484[…]s/run_a_prefect_server/docker_compose/multi-server/compose.yaml) example --- but my redis instance and load balancers are already hosted on GCP). • do you have any info. on what might be happening? • can you tell me the exact flow of information between the service container and api container? for context, here're the relevant parts of my compose file:
Copy code
services:
  prefect-server:
    image: prefecthq/prefect:3-latest
    container_name: prefect-server
    ports:
      - "8080:4200"
    env_file:
      - /home/prefect/config.env
    command: prefect server start --host 0.0.0.0 --no-services
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - prefect-network
    restart: always

  prefect-background-services:
      image: prefecthq/prefect:3-latest
      depends_on: [prefect-server]
      container_name: prefect-background-services
      command: /bin/bash -c "pip install prefect-gcp prefect-redis && prefect server services start"
      environment:
        - PREFECT_API_URL=<http://10.142.0.18:8080/prefect/api>
      env_file:
        - /home/prefect/config.env
      extra_hosts:
        - "host.docker.internal:host-gateway"
      networks:
        - prefect-network
      restart: always
note: the API_URL provided here is the URL which forwards traffic to the machine on which these containers are run. these URLs, redis, and postgres are all accessible from both containers so its not an accessibility issue
m
thought for 105.3 seconds
Based on the research and your setup, I can identify a few potential issues and provide some clarity on the communication flow: ### Information Flow Between Services and API 1. The Prefect services container runs multiple services including: - Flow run orchestration service - Task run orchestration service - Work pool orchestration service These services communicate with the API server via HTTP requests. 2. When a task runs: - The worker executes the task - Task state changes are sent to the API server - The API server stores this in the database - The UI polls the API server for updates ### Potential Issues and Solutions 1. Environment Configuration Mismatch - Both containers need identical database and messenger (Redis) configurations - Check that both containers have the same values in
/home/prefect/config.env
- Ensure both
PREFECT_API_URL
and messenger (Redis) settings are correctly configured 2. API URL Configuration The current setup uses:
Copy code
yaml
   PREFECT_API_URL=<http://10.142.0.18:8080/prefect/api>
Try these troubleshooting steps: - Verify this URL is accessible from within the services container:
Copy code
bash
     curl <http://10.142.0.18:8080/prefect/api/health>
- Consider using the container name instead:
Copy code
yaml
     PREFECT_API_URL=<http://prefect-server:4200/api>
3. Redis Configuration - Ensure both containers have identical Redis configuration - Verify Redis connection from both containers - Check Redis logs for any connection issues 4. Debugging Steps - Enable debug logging by adding to your config.env:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG
- Check logs in both containers for any connection issues - Verify task state changes are being sent by checking Redis pub/sub messages 5. Network Configuration Your network setup looks correct with both containers on
prefect-network
, but verify: - Both containers can resolve each other's hostnames - No firewall rules are blocking container-to-container communication ### Recommendations 1. First, verify API connectivity:
Copy code
bash
   # From inside the services container
   curl <http://10.142.0.18:8080/prefect/api/health>
2. Check service logs with increased verbosity:
Copy code
bash
   docker logs prefect-background-services
   docker logs prefect-server
3. Verify Redis connectivity from both containers:
Copy code
python
   from prefect.client.schemas.actions import ClientAction
   from prefect.client import get_client
   
   # This will attempt to connect to both API and Redis
   async with get_client() as client:
       print("Connected successfully")
Would you like me to provide more specific debugging steps for any of these areas? Also, could you share: 1. The contents of your config.env file (excluding sensitive information)? 2. Are you seeing any specific behavior in the UI (like delayed updates, or completely missing tasks)?
j
@Marvin my config.env looks like this
Copy code
PREFECT_HOME=/home/prefect
PREFECT_SERVER_API_BASE_PATH=/prefect/api
PREFECT_SERVER_UI_SERVE_BASE=/prefect
PREFECT_API_URL=/prefect/api
PREFECT_LOGGING_LEVEL=DEBUG
PREFECT_EXTRA_LOGGERS=sqlalchemy,asyncpg,urllib3
PREFECT_SQLALCHEMY_POOL_SIZE=500
PREFECT_SQLALCHEMY_MAX_OVERFLOW=500
PREFECT_MESSAGING_BROKER=prefect_redis.messaging
PREFECT_MESSAGING_CACHE=prefect_redis.messaging
PREFECT_REDIS_MESSAGING_DB=0
PREFECT_REDIS_MESSAGING_HOST=10.142.0.16
PREFECT_REDIS_MESSAGING_PORT=6379
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=120
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=180
PREFECT_SERVER_DATABASE_TIMEOUT=60
PREFECT_SQLALCHEMY_MAX_OVERFLOW=100
PREFECT_SQLALCHEMY_POOL_SIZE=100
i removed sensitive values. like i said, it is not an accessibility issue; all the URLs and connections work as expected from within the containers. the UI just doesnt show tasks on the chart, but logs are populated and the flow completes successfully
m
thought for 30.7 seconds
I think I've identified the issue. There are two key problems in your configuration: 1. Main Issue - API URL Configuration In your config.env, you have:
Copy code
PREFECT_API_URL=/prefect/api
This is a relative URL, which won't work correctly for the services container. The services container needs an absolute URL to connect to the API server. You already have the correct URL in your docker-compose file, but it needs to be in the config.env file instead:
Copy code
PREFECT_API_URL=<http://10.142.0.18:8080/prefect/api>
2. Potential Secondary Issue - Duplicate Configuration You have both:
Copy code
PREFECT_SQLALCHEMY_MAX_OVERFLOW=500
   PREFECT_SQLALCHEMY_MAX_OVERFLOW=100
The second value would override the first, but this isn't related to your current issue. ### The Fix 1. Update your config.env to use the absolute URL:
Copy code
PREFECT_API_URL=<http://10.142.0.18:8080/prefect/api>
2. Remove the PREFECT_API_URL from the docker-compose environment section since it will be in the config.env:
Copy code
yaml
   prefect-background-services:
       image: prefecthq/prefect:3-latest
       depends_on: [prefect-server]
       container_name: prefect-background-services
       command: /bin/bash -c "pip install prefect-gcp prefect-redis && prefect server services start"
       env_file:
         - /home/prefect/config.env
       extra_hosts:
         - "host.docker.internal:host-gateway"
       networks:
         - prefect-network
       restart: always
### Why This Should Fix It The services container needs to know exactly where to find the API server. When you provide a relative URL (
/prefect/api
), the services don't know what base URL to use. By providing the absolute URL, the services can properly communicate state changes to the API server, which is necessary for the UI to display task information. The reason your flows still complete successfully is that the actual flow execution and Redis messaging are working correctly - it's just the API communication for UI updates that's affected by the relative URL. Try making these changes and let me know if you start seeing your tasks appear in the UI. If not, we can enable more detailed debugging by adding some additional loggers to your configuration.
j
@Marvin in docker-compose, explicit environment configurations take precedence over environment files. this is confirmed by
docker exec -it ... /bin/bash
into the container and looking at the value of PREFECT_API_URL, so it's not that
m
thought for 81.4 seconds
Since we've confirmed it's not an API URL issue, and you've verified all connections are working, let's try to debug this methodically: 1. First, let's verify the Redis messaging configuration is being picked up correctly From inside the services container, could you run this Python code to check the actual Redis configuration being used:
Copy code
python
   from prefect.settings import PREFECT_REDIS_MESSAGING_HOST, PREFECT_REDIS_MESSAGING_PORT, PREFECT_REDIS_MESSAGING_DB
   print(f"Redis Host: {PREFECT_REDIS_MESSAGING_HOST.value()}")
   print(f"Redis Port: {PREFECT_REDIS_MESSAGING_PORT.value()}")
   print(f"Redis DB: {PREFECT_REDIS_MESSAGING_DB.value()}")
2. Check if task states are being published to Redis You can monitor Redis messages in real-time. From inside either container:
Copy code
python
   import redis
   import json

   r = redis.Redis(
       host='10.142.0.16',
       port=6379,
       db=0
   )

   pubsub = r.pubsub()
   pubsub.subscribe('prefect:*')  # Subscribe to all Prefect channels

   for message in pubsub.listen():
       if message['type'] == 'message':
           print(json.loads(message['data']))
3. Check the services that are actually running In the services container:
Copy code
bash
   ps aux | grep prefect
You should see multiple service processes running. 4. Verify the orchestration services are receiving messages Add these specific loggers to your config:
Copy code
PREFECT_EXTRA_LOGGERS=prefect.orchestration.services,prefect.client,prefect.flow_runner,prefect.task_runner
Questions: 1. When you look at the UI, do you see the flow runs themselves? Is it just the task information that's missing? 2. In the task runs page of a flow run, do you see any task runs listed at all, or are they completely absent? 3. Are you using any custom task runners or flow runners? 4. Can you check if there are any messages in Redis at all related to Prefect? This command will show all keys:
Copy code
bash
   redis-cli -h 10.142.0.16 -p 6379 keys "prefect:*"
The fact that logs are populated but tasks aren't showing up suggests that the task state updates are either: 1. Not being published to Redis correctly 2. Not being consumed by the orchestration services 3. Not being properly processed by the API server Let's start with checking these aspects and then we can narrow down where the communication chain is breaking.
j
@Marvin • Checking the configuration in this way gives me: ImportError: cannot import name 'PREFECT_REDIS_MESSAGING_HOST' from 'prefect.settings' (/usr/local/lib/python3.12/site-packages/prefect/settings/__init__.py) --- but the environment variable is correctly populated but, the real-time redis monitoring doesn't show anything so i believe the messages aren't even being populated
m
thought for 95.7 seconds
Looking at your configuration and the research results, I notice a potential issue. In your config.env, you have:
Copy code
PREFECT_MESSAGING_BROKER=prefect_redis.messaging
PREFECT_MESSAGING_CACHE=prefect_redis.messaging
But the services container needs to have the
prefect-redis
package properly installed and available when the services start. In your docker-compose, you're installing it, but there might be a race condition where the services start before the installation completes:
Copy code
yaml
command: /bin/bash -c "pip install prefect-gcp prefect-redis && prefect server services start"
Let's try a more robust approach: 1. First, create a new Dockerfile for your services:
Copy code
dockerfile
FROM prefecthq/prefect:3-latest
RUN pip install prefect-gcp prefect-redis
CMD ["prefect", "server", "services", "start"]
2. Update your docker-compose.yml to use this custom image:
Copy code
yaml
prefect-background-services:
    build: .  # assuming Dockerfile is in the same directory
    depends_on: [prefect-server]
    container_name: prefect-background-services
    environment:
      - PREFECT_API_URL=<http://10.142.0.18:8080/prefect/api>
    env_file:
      - /home/prefect/config.env
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - prefect-network
    restart: always
3. Also, let's verify the Redis connection by adding this to your config.env:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG
PREFECT_EXTRA_LOGGERS=prefect_redis.messaging,prefect.server.services,prefect.server.utilities.messaging
4. After making these changes, rebuild and restart your containers:
Copy code
bash
docker-compose down
docker-compose build
docker-compose up -d
5. Then check the logs:
Copy code
bash
docker logs prefect-background-services
The key things we're looking for in the logs are: - Successful initialization of the Redis messaging broker - Any errors related to Redis connections or messaging - Service startup messages indicating which services are running Can you try these changes and let me know what you see in the logs? Particularly, we want to confirm that: 1. The Redis messaging package is properly installed 2. The messaging broker is successfully initialized 3. There are no errors related to Redis connections If you still don't see any Redis messages after these changes, we might need to verify the Redis messaging implementation itself is working as expected.
j
@Marvin i don't think that's the problem. the logs show: prefect-background-services | 195142.338 | DEBUG | prefect.profiles - Using profile 'ephemeral' prefect-default-workpool | 195142.956 | DEBUG | prefect.profiles - Using profile 'ephemeral' prefect-background-services | prefect-background-services | Starting services... Press CTRL+C to stop prefect-background-services | prefect-background-services | 195144.115 | DEBUG | prefect.prefect_redis.messaging - Consumer group already exists: BUSYGROUP Consumer Group name already exists prefect-background-services | 195144.121 | DEBUG | prefect.prefect_redis.messaging - Consumer group already exists: BUSYGROUP Consumer Group name already exists prefect-background-services | 195144.123 | DEBUG | prefect.prefect_redis.messaging - Consumer group already exists: BUSYGROUP Consumer Group name already exists prefect-background-services | 195144.131 | DEBUG | prefect.prefect_redis.messaging - Consumer group already exists: BUSYGROUP Consumer Group name already exists prefect-background-services | 195144.595 | DEBUG | prefect.prefect_redis.messaging - Consumer group already exists: BUSYGROUP Consumer Group name already exists
m
thought for 29.0 seconds
Given that Redis messaging is working (as evidenced by the consumer group messages), and you mentioned that logs are being populated and flows complete successfully, this suggests the issue might be with how the tasks are being executed or how their states are being reported. Could you tell me: 1. How are you running your flows? Are you using: - A worker? - Direct flow.serve()? - Some other method? 2. In your flow code, are you using any custom task runners or flow runners? 3. Could you share a minimal example of one of your flow definitions that's experiencing this issue? Something like:
Copy code
python
from prefect import flow, task

@task
def my_task():
    return "hello"

@flow
def my_flow():
    result = my_task()
The fact that logs are populated but tasks aren't showing up is particularly interesting because it suggests that the task execution information is making it to the API server (hence the logs) but something might be wrong with how the task states are being processed or stored. Also, could you check the API server logs specifically around the time a flow runs? Look for any messages containing "task" or "state":
Copy code
bash
docker logs prefect-server | grep -i "task\|state"
This will help us understand if the API server is receiving the task state updates but perhaps not processing them correctly.
j
@Marvin is there a way to test redis mesaging from within the container? this has nothing to do w/ splitting the background services. it happens when i set the messenger to redis
m
thought for 24.5 seconds
This is very interesting - I found a relevant issue that matches your symptoms. It appears there might be an issue with how Redis messaging handles task states, particularly when services are restarted or when multiple state updates occur in quick succession. Let's try a few things: 1. First, check your Redis streams directly. Run this in the container:
Copy code
python
import redis

r = redis.Redis(
    host='10.142.0.16',
    port=6379,
    db=0
)

# List all streams
streams = r.keys('prefect:*')
print("Existing streams:", streams)

# For each stream, get the last few messages
for stream in streams:
    if b'stream' in stream:
        print(f"\nMessages in {stream.decode()}:")
        messages = r.xread({stream: '0-0'}, count=5)
        print(messages)
2. Let's also check your Redis configuration more thoroughly. What's the output of:
Copy code
bash
docker exec -it prefect-background-services env | grep REDIS
3. Could you also try temporarily switching back to the default memory messaging to confirm it's specifically a Redis issue?
Copy code
bash
# Remove these from your config.env temporarily
# PREFECT_MESSAGING_BROKER=prefect_redis.messaging
# PREFECT_MESSAGING_CACHE=prefect_redis.messaging
Based on the issue I found, there might be a problem with how task states are being persisted when using Redis messaging. The states are being published but might not be getting properly consumed and persisted to the database. Let me know the results of these tests and we can determine the best path forward. There might be a few potential solutions: 1. Upgrade to the latest version of prefect-redis if you're not already on it 2. Add some retry logic for state persistence 3. Configure Redis stream settings differently 4. Use a workaround with the default memory messaging if needed Which would you like to try first?
j
@Marvin does the key
"message:messaging-cache:cd5b645f-66a6-47f4-ac4a-03b5ecd9083c"
being in Redis mean anything to you? also can you give me some prefect commands that diagnose the redis cache? something like
prefect cache view
m
thought for 57.0 seconds
Yes, that Redis key pattern is significant! The
message:messaging-cache:[uuid]
pattern indicates that Prefect is using Redis for message caching as part of its messaging system. This is part of Prefect's distributed messaging infrastructure. While there isn't a direct
prefect cache
command, you can inspect the Redis cache directly using Redis CLI commands. Here's how to diagnose the Redis cache: 1. Connect to Redis and inspect the key:
Copy code
bash
# From inside the container
redis-cli -h 10.142.0.16 -p 6379

# Once connected:
TYPE message:messaging-cache:cd5b645f-66a6-47f4-ac4a-03b5ecd9083c
GET message:messaging-cache:cd5b645f-66a6-47f4-ac4a-03b5ecd9083c
TTL message:messaging-cache:cd5b645f-66a6-47f4-ac4a-03b5ecd9083c
2. List all Prefect-related keys:
Copy code
bash
# Show all Prefect messaging cache keys
KEYS message:messaging-cache:*

# Show all Prefect-related keys
KEYS prefect:*
3. Monitor Redis in real-time while a flow runs:
Copy code
bash
MONITOR
The presence of that key suggests that the Redis messaging system is working at some level - messages are being cached. However, if task states aren't showing up in the UI, there might be an issue with: 1. Message consumption 2. Message persistence 3. Cache expiration Could you try running these Redis commands and share: 1. What type of value is stored in that cache key? 2. Does it have a TTL set? 3. Do you see new cache keys being created when you run flows? This will help us understand if the messaging system is functioning correctly but perhaps not being consumed properly, or if messages are expiring too quickly. Also, you might want to check the Redis stream keys specifically:
Copy code
bash
# In redis-cli
KEYS prefect:stream:*
If you see stream keys but no messages are being processed, we might need to look at the consumer groups:
Copy code
bash
# Replace [stream-key] with an actual stream key you find
XINFO GROUPS [stream-key]
This will show us if the consumer groups are properly set up and processing messages.