Hello again I think i ve tracked down an issue with MS Teams Prefect Community #ask-community

Hello again! I think i’ve tracked down an issue wi...

Tim Ricablanca

11/10/2022, 10:24 PM

Hello again! I think i’ve tracked down an issue with MS Teams notifications and the notification queue in general where only certain flow run states were being sent. This logger message appears:

Copy code

prefect-orion-1  | 21:53:34.590 | DEBUG   | prefect.orion.services.flowrunnotifications - Got 3 notifications from queue.

However, it looks like the code intends for only one notification to be read off the queue in the first place: https://github.com/PrefectHQ/prefect/blob/main/src/prefect/orion/services/flow_run_notifications.py#L38-L42.

Tim Ricablanca

11/10/2022, 10:26 PM

The behavior I’m seeing is that if we have notifications for

Running

and

Failed

in succession, only one of them gets read off the queue and the other is discarded.

Tim Ricablanca

11/10/2022, 10:28 PM

If i remove the notification for the

Running

state, then the

Failed

notification makes it out of the queue. If there’s more time (I believe the polling time is 4 seconds) in the flow run between the Running and Failed notifications getting enqueued, then both seem to get read off appropriately.

Tim Ricablanca

11/10/2022, 10:28 PM

I was only able to get this far thanks to this: https://prefect-community.slack.com/archives/CL09KU1K7/p1668038154353999 — so this already feels like progress.

Tim Ricablanca

11/10/2022, 10:30 PM

Other details: still running Prefect 2.6.4 + Postgres in Docker.

Zanie

11/10/2022, 10:38 PM

Thanks for the additional details! We’ll look into this.

🙏 1

Zanie

11/10/2022, 10:39 PM

cc @Zach Angell perhaps this broke during the query optimizations?

Zanie

11/10/2022, 10:48 PM

Can you share the output of

prefect version

where you’re running your server? Curious if you’re using Postgres or SQLite since these queries are different for each backend.

Carlo

11/10/2022, 10:50 PM

2.6.4

Carlo

11/10/2022, 10:50 PM

Postgres

Tim Ricablanca

11/10/2022, 10:52 PM

^ what he said:

Copy code

Version:             2.6.4
API version:         0.8.2
Python version:      3.9.15
Git commit:          51e92dda
Built:               Thu, Oct 20, 2022 3:11 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         hosted

Tim Ricablanca

11/10/2022, 10:52 PM

Copy code

psql (PostgreSQL) 14.1

Zanie

11/10/2022, 10:57 PM

Great thanks!

Zanie

11/10/2022, 11:04 PM

I can’t reproduce, can you also get the versions

pip show sqlalchemy asyncpg

Tim Ricablanca

11/10/2022, 11:27 PM

Copy code

Name: SQLAlchemy
Version: 1.4.42
Summary: Database Abstraction Library
Home-page: <https://www.sqlalchemy.org>
Author: Mike Bayer
Author-email: <mailto:mike_mp@zzzcomputing.com|mike_mp@zzzcomputing.com>
License: MIT
Location: /usr/local/lib/python3.9/site-packages
Requires: greenlet
Required-by: alembic, prefect
---
Name: asyncpg
Version: 0.26.0
Summary: An asyncio PostgreSQL driver
Home-page: <https://github.com/MagicStack/asyncpg>
Author: MagicStack Inc
Author-email: <mailto:hello@magic.io|hello@magic.io>
License: Apache License, Version 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires:
Required-by: prefect

Zanie

11/10/2022, 11:29 PM

I’m testing in https://github.com/PrefectHQ/prefect/pull/7504

Tim Ricablanca

11/10/2022, 11:35 PM

If it helps, this is the docker image we are using: prefecthq/prefect:2.6.4-python3.9

Zanie

11/10/2022, 11:40 PM

Our tests run against Postgres 13 in CI but I can’t reproduce this against 14 locally either.

Zanie

11/10/2022, 11:41 PM

Can you turn on

PREFECT_ORION_DATABASE_ECHO

Tim Ricablanca

11/10/2022, 11:43 PM

i did

prefect config set PREFECT_ORION_DATABASE_ECHO=True

— i’ll tail the logs and try to re-run

Zanie

11/10/2022, 11:45 PM

I’m interested in seeing the query that’s being used to retrieve notifications, it should have

LIMIT 1

from the code but apparently that’s not working for ya’ll

Tim Ricablanca

11/10/2022, 11:46 PM

does the server need to be bounced for that setting to take effect? i’m also running the logging hotfix you posted yesterday, so i’m seeing the log streams for prefect and uvicorn but the DB is not echoing

Zanie

11/10/2022, 11:47 PM

Yeah the config isn’t hot-reloaded

Tim Ricablanca

11/10/2022, 11:47 PM

okay will bounce the container with config set

Tim Ricablanca

11/10/2022, 11:52 PM

okay, got the super verbose logs now. re-running the test flow.

Tim Ricablanca

11/10/2022, 11:53 PM

is this what you’re looking for:

Tim Ricablanca

11/10/2022, 11:54 PM

Copy code

prefect-orion-1  | 23:52:39.362 | INFO    | sqlalchemy.engine.Engine - WITH queued_notifications AS
prefect-orion-1  | (DELETE FROM flow_run_notification_queue WHERE flow_run_notification_queue.id IN (SELECT flow_run_notification_queue.id
prefect-orion-1  | FROM flow_run_notification_queue ORDER BY flow_run_notification_queue.updated
prefect-orion-1  |  LIMIT %s FOR UPDATE SKIP LOCKED) RETURNING flow_run_notification_queue.id, flow_run_notification_queue.flow_run_notification_policy_id, flow_run_notification_queue.flow_run_state_id)
prefect-orion-1  |  SELECT queued_notifications.id AS queue_id, flow_run_notification_policy.id AS flow_run_notification_policy_id, flow_run_notification_policy.message_template AS flow_run_notification_policy_message_template, flow_run_notification_policy.block_document_id, flow.id AS flow_id, flow.name AS flow_name, flow_run.id AS flow_run_id, flow_run.name AS flow_run_name, flow_run.parameters AS flow_run_parameters, flow_run_state.type AS flow_run_state_type, flow_run_state.name AS flow_run_state_name, flow_run_state.timestamp AS flow_run_state_timestamp, flow_run_state.message AS flow_run_state_message
prefect-orion-1  | FROM queued_notifications JOIN flow_run_notification_policy ON queued_notifications.flow_run_notification_policy_id = flow_run_notification_policy.id JOIN flow_run_state ON queued_notifications.flow_run_state_id = flow_run_state.id JOIN flow_run ON flow_run_state.flow_run_id = flow_run.id JOIN flow ON flow_run.flow_id = flow.id
prefect-orion-1  | 2022-11-10 23:52:39,363 INFO sqlalchemy.engine.Engine [cached since 75.8s ago] (1,)

Tim Ricablanca

11/10/2022, 11:56 PM

right before the notification was sent via

apprise

—

Copy code

prefect-orion-1  | 23:53:27.396 | DEBUG   | prefect.orion.services.flowrunnotifications - Got 2 notifications from queue.
prefect-orion-1  | 23:53:27.396 | DEBUG   | prefect.orion.services.flowrunnotifications - Got 2 notifications from queue.

Zanie

11/10/2022, 11:57 PM

LIMIT %s

👀

Tim Ricablanca

11/10/2022, 11:57 PM

oh i missed that!

Tim Ricablanca

11/10/2022, 11:58 PM

i love a good prepared statement(?) bug

Tim Ricablanca

11/11/2022, 12:05 AM

i unfortunately don’t have an orion environment set up in my debugger at the moment otherwise i’d try to step through where that CTE is built and see if it’s just a bad value passed or if sqlalchemy is not happy with something.

Zanie

11/11/2022, 12:10 AM

With a debugger:

Copy code

(Pdb) print(notification_details_stmt.compile(compile_kwargs={"literal_binds": True}))
WITH queued_notifications AS 
(DELETE FROM flow_run_notification_queue WHERE flow_run_notification_queue.id IN (SELECT flow_run_notification_queue.id 
FROM flow_run_notification_queue ORDER BY flow_run_notification_queue.updated
 LIMIT 1 FOR UPDATE) RETURNING flow_run_notification_queue.id, flow_run_notification_queue.flow_run_notification_policy_id, flow_run_notification_queue.flow_run_state_id)
 SELECT queued_notifications.id AS queue_id, flow_run_notification_policy.id AS flow_run_notification_policy_id, flow_run_notification_policy.message_template AS flow_run_notification_policy_message_template, flow_run_notification_policy.block_document_id, flow.id AS flow_id, flow.name AS flow_name, flow_run.id AS flow_run_id, flow_run.name AS flow_run_name, flow_run.parameters AS flow_run_parameters, flow_run_state.type AS flow_run_state_type, flow_run_state.name AS flow_run_state_name, flow_run_state.timestamp AS flow_run_state_timestamp, flow_run_state.message AS flow_run_state_message 
FROM queued_notifications JOIN flow_run_notification_policy ON queued_notifications.flow_run_notification_policy_id = flow_run_notification_policy.id JOIN flow_run_state ON queued_notifications.flow_run_state_id = flow_run_state.id JOIN flow_run ON flow_run_state.flow_run_id = flow_run.id JOIN flow ON flow_run.flow_id = flow.id

Zanie

11/11/2022, 12:11 AM

Constructing the query at https://github.com/PrefectHQ/prefect/blob/main/src/prefect/orion/database/query_components.py#L487-L552

Zanie

11/11/2022, 12:12 AM

I’m going to have to point someone else at this tomorrow, it looks sensible and I can’t reproduce the issue myself but it’s clearly happening on your end from the log 🙂

Tim Ricablanca

11/11/2022, 12:14 AM

That sounds good - appreciate you digging in! Happy to provide any other Orion runtime environment details as needed.

Zanie

11/11/2022, 12:15 AM

Minor note, mine printed

LIMIT :param_1

for Postgres and

LIMIT %s

for SQLite — are you sure it’s running against postgres as intended?

Tim Ricablanca

11/11/2022, 12:17 AM

from the orion settings panel:

PREFECT_ORION_DATABASE_CONNECTION_URL: <postgresql+asyncpg://postgres>@...

Zanie

11/11/2022, 12:18 AM

That’s pretty promising 🙂 I don’t think there’s a CTE for SQLite anyway, the query would look pretty different.

Tim Ricablanca

11/11/2022, 12:20 AM

I mean, you never know what else they’ll figure out how to package into SQLite… i can also share the docker-compose i’m launching this from, minus any interesting-to-only-us bits

Tim Ricablanca

11/11/2022, 12:25 AM

Copy code

# docker-compose.yml
version: "3.9"
services:
  database:
    image: postgres:14.1-alpine
    restart: always
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=MySuperSecretPassword
      - POSTGRES_DB=orion
    expose:
      - 5432
    volumes:
      - /prefect/postgres:/var/lib/postgresql/data

  orion:
    image: prefecthq/prefect:2.6.4-python3.9
    restart: always
    volumes:
      - /prefect/orion:/root/.prefect
      - /prefect/monkeypatch/orion.py:/usr/local/lib/python3.9/site-packages/prefect/cli/orion.py # temporary logging fix, see: <https://github.com/PrefectHQ/prefect/compare/poc/fix-prefect-logging-uvicorn>
    entrypoint: ["prefect", "orion", "start"]
    environment:
      - PREFECT_ORION_API_HOST=0.0.0.0
      - PREFECT_ORION_API_PORT=80
      - PREFECT_API_URL=<http://our-server-lives-here.com:80/api>
      - PREFECT_API_KEY=<guid>
      - PREFECT_ORION_DATABASE_CONNECTION_URL=<postgresql+asyncpg://postgres:MySuperSecretPassword@database:5432/orion>
      - PREFECT_LOGGING_LEVEL=DEBUG # default: INFO
      - PREFECT_LOGGING_SERVER_LEVEL=DEBUG # default: WARNING
      - PREFECT_DEBUG_MODE=True
      - PREFECT_LOGGING_ORION_ENABLED=True
      - PREFECT_ORION_DATABASE_ECHO=True
    ports:
      - 80:80
    depends_on:
      - database

Zanie

11/11/2022, 9:14 PM

Just an update here: We can reproduce this but not in unit tests.

Carlo

11/11/2022, 9:14 PM

progress, thanks for the update

Zanie

11/11/2022, 9:15 PM

We’re all thoroughly confused though 🙂

Tim Ricablanca

11/11/2022, 9:15 PM

The best kind of bug!

Zanie

11/14/2022, 5:53 PM

Would you mind opening an issue? I think this is going to move to our backlog as we’ve made no progress 😕

Tim Ricablanca

11/14/2022, 7:26 PM

I can do that. Thanks for digging in this far.

Tim Ricablanca

11/17/2022, 5:28 PM

Hey @Zanie, I’ve filed that here: https://github.com/PrefectHQ/prefect/issues/7568. Let me know how I can help you or the other community engineers on this one. Thanks again!

4 Views

Open in Slack

Previous Next