< Marvin> explain me what kind of metadata is stored in all Prefect Community #marvin-ai

<@ULVA73B9P> explain me what kind of metadata is s...

Kartik

10/07/2025, 11:02 AM

@Marvin explain me what kind of metadata is stored in all the below tables created by prefect event_resources events task_run_state task_run log flow_run flow_run_state deployment block_schema block_type work_queue worker block_document flow deployment_schedule configuration work_pool automation_event_follower alembic_version artifact deployment_version block_schema_reference automation_bucket automation_related_resource agent artifact_collection composite_trigger_child_firing task_run_state_cache saved_search concurrency_limit concurrency_limit_v2 flow_run_input variable csrf_token block_document_reference automation

Marvin

10/07/2025, 11:02 AM

✅ thought for 107.9 seconds

Marvin

10/07/2025, 11:04 AM

Great question — here’s a concise, table-by-table overview of what Prefect (assuming Prefect 3.x) stores in each table. Exact columns can vary between releases; for the authoritative schema, see the ORM models in source: Prefect Server ORM models (schema) Runs and states - flow: Registered flows. Stores flow metadata like name, slug, storage/source info, creation timestamps; referenced by deployments and flow runs. - flow_run: One row per flow run. Holds identifiers and runtime metadata (flow_id, deployment_id, parameters snapshot, tags, start/end timestamps, state_id/current state pointer, parent/child relationships, etc.). - flow_run_state: Full state history for flow runs. Each row records a transition (type, name, timestamp, message, state_details JSON like scheduled time, pause details, etc.). - task_run: One row per task execution inside a flow run. Includes flow_run_id, task_key, dynamic_key, cache_key, state pointer, retries, run times, etc. - task_run_state: Full state history for task runs (same structure as flow_run_state but for tasks). - task_run_state_cache: Cache index for task results keyed by cache_key, including associated state, expiration, and scope — used for task caching. - log: All structured logs emitted by flows/tasks. Includes flow_run_id/task_run_id, timestamp, level, logger/name, message, and extra metadata. - flow_run_input: Server-side storage for flow run “inputs” payloads (e.g., data sent to a run at start or via UI/APIs) to support retries/resumes and auditability. Deployments and scheduling - deployment: Deployment definitions for a flow. Stores name, flow_id, version link, default parameters, tags, work_pool association, job variables/overrides, storage/source location, description, etc. - deployment_version: Immutable records of deployment “releases” (build metadata, manifest/hash, parameters snapshot, source reference, version number). - deployment_schedule: Zero or more schedules attached to a deployment (cron/interval/RRule definitions, timezone, active/paused). Work pools, queues, and workers - work_pool: Execution backends (e.g., Process, Docker, Kubernetes, ECS). Stores type, base job template, default settings, concurrency controls, pause status. - work_queue: Named queues under a work pool, with optional filters and per-queue concurrency limits; used to route scheduled work to workers. - worker: Registered workers and their heartbeats/configuration. Includes worker name, type, pool_id, status, last_heartbeat, and capabilities. - agent: Legacy agent records (from Prefect 2.x). Retained for compatibility/telemetry in some environments; superseded by workers in 3.x. Blocks (configuration objects) - block_type: Definitions for block types (e.g., “S3 Bucket”, “GCS Credentials”). Includes slug, display name, logo/docs links, and code entrypoint info. - block_schema: JSON schema for a block_type at a specific version (field definitions, defaults, encryption flags). - block_schema_reference: References between block schemas (supports nested/linked schemas). - block_document: A concrete block instance with values (linked to block_schema/type), including name, is_protected, is_anonymous, and the stored encrypted/unencrypted payload. - block_document_reference: Links from one block document to another (used for nested block configurations). Automations and events - automation: Automation definitions (triggers and actions). Stores name, enabled flag, trigger configuration (event filters, windows), actions (e.g., create flow run), owner/metadata. - automation_bucket: Internal bucketization for automations to coordinate event consumption and deduplication. - automation_related_resource: Join table tying automations to specific resources they relate to (for targeting/scoping).

Marvin

10/07/2025, 11:04 AM

- automation_event_follower: Tracks the progress of an automation following an event stream (e.g., last seen event time/ID) for reliability and ordering. - composite_trigger_child_firing: Persists intermediate “child trigger” firings for composite triggers (e.g., AND/OR across multiple conditions) so they can be combined deterministically across events. - events: Append-only event ledger (type, occurred time, idempotency key, payload) for Prefect’s event system used by automations and observability. - event_resources: Resources attached to an event (e.g., flow-run, deployment, work-pool). Acts as a join/index table for fast filtering on resources/labels. Artifacts - artifact: Records of artifacts published by flows/tasks (e.g., markdown, links, tables, metrics). Includes key, type, description, data/metadata, flow_run_id/task_run_id, created/updated/expires. - artifact_collection: Logical grouping of artifacts across runs under a common key for discovery and retention. Concurrency and saved views - concurrency_limit: Legacy global concurrency limits (by tag/name) controlling how many runs/tasks can run concurrently within a scope. - concurrency_limit_v2: Newer/scoped concurrency primitives with slot tracking and more flexible scoping semantics. - saved_search: User/UI-saved queries/filters (e.g., “my running flows with tag X”) for quick access. Configuration, variables, security - configuration: Server-side key/value configuration storage for system features and flags (internal use). - variable: Prefect Variables — user-defined key/value pairs with optional encryption and descriptions; used for parameterizing deployments and flows. - csrf_token: Server-side CSRF tokens for UI/session protection; short-lived and rotated. - alembic_version: Standard Alembic table tracking the current database migration head/version. Work orchestration helpers - work_queue: listed above. - work_pool: listed above. - worker: listed above. - agent: listed above. Notes and references - Source of truth for table definitions (including columns and relationships): ORM models (schema) - High-level concepts these tables support: - Deployments: Deployments - Work pools and workers: Work pools and workers - Blocks: Blocks - Automations and events: Automations and Events - Artifacts: Artifacts - Concurrency limits: Concurrency limits - Variables: Variables If you want, I can generate the exact column lists for any of these tables (e.g., by querying information_schema) or deep-dive into specific relationships you care about.

Open in Slack

Previous Next