prefect-community #prefect-community

Hi everyone, Me and my team are creating some flows on prefect. Initially I had my own prefect account and later on my teammate added me to his prefect account (company account). Now when i try to register the flows to this newly migrated account i get the authentication error. i did try to add the latest credentials to the auth.toml file. but i still get this error.

Copy code

prefect.exceptions.AuthorizationError: [{'path': ['project'], 'message': 'AuthenticationError: Forbidden', 'extensions': {'code': 'UNAUTHENTICATED'}}]

Joshua S

01/10/2022, 3:05 PM

Good Morning, I wanted to share a Service we here at Softrams created for Prefect. Our customers and use cases currently require Authorization to secure the runs and access to Prefect. Please check out our open-sourced service! https://github.com/softrams/prefect-auth-proxy

Alvaro Durán Tovar

01/10/2022, 3:18 PM

is it possible to set max_retries=0 on a task? I'm trying to do so for

RunNamespacedJob

no success so far

Miroslav Rác

01/10/2022, 3:25 PM

Hi everyone, I am new here! I am trying Prefect for the first time but I have come into an issue. My registered flow is not showing in the UI. I am running it locally via docker. testflow.py

Copy code

import prefect
from prefect import Flow, task

@task
def test_task():
    return 'ok'

with Flow('test_flow') as flow:
    test_task()

flow.register(project_name="other")

my CLI:

Copy code

> prefect backend server
Backend switched to server
> prefect server create-tenant --name default
Tenant created with ID: 6bf4ef79-ddcb-4ea3-8ea2-4dcab05a375a
> prefect agent local start
[2022-01-10 15:12:15,508] INFO - agent | Starting LocalAgent with labels ['72f026a71863']
...
> python /home/testflow.py
Flow URL: <http://localhost:8080/default/flow/bf9d5401-9034-4ec0-8e43-9149b0718d23>
 └── ID: 5e6af0a7-d5af-47bf-831b-b0a4fb4d3401
 └── Project: other
 └── Labels: ['72f026a71863']

when I open the URL directly, I can see the flow in the UI. I can run it via “quick run” and it runs. however, the flow is not listed in my project nor the run is visible under the flow (when accessed via URL, I see no activity and no run history). on the main dashboard, the run activity is showed but seem like it went astray since it is not assigned to the flow. the “flows” tab shows “you have no flows in this project” SETUP: I created

docker-compose.yaml

via

prefect server config

and changed hasura image to

hasura/graphql-engine:latest

because the default hasura image is not working fine with my mac M1 pro. I added another service from which I try to register and run flows

Copy code

client:
    image: python:3.8.12-slim
    command: bash -c "apt-get update -y && apt-get install gcc -y && pip install prefect[dev] && prefect backend server && tail -f /dev/null"
    volumes:
      - ./client:/home # here I have my testflow.py
    networks:
      prefect-server: null
    environment:
      - PREFECT__SERVER__HOST=<http://apollo>

EDIT: added info on my setup

Amichai Ben Ami

01/10/2022, 3:45 PM

Hi, I am trying to start with Prefect 🙂 we want to build the flows in our CI using Github Actions we run locally python script and build the Dockers successfully when doing the same in Github workflow

Copy code

storage = flows_storage(registry_url, image_name, tag)
storage = Docker(
        dockerfile="Dockerfile",
        image_tag=tag,
        registry_url=registry_url,
        image_name=image_name,
    )
storage.build(push=False)

traceback in the Thread attached. any idea? maybe timeout? Anyway to increase timeout? Thanks

E Li

01/10/2022, 4:49 PM

Hi, I couldn't create a tenant, and got this error when running 'prefect server create-tenant -n default'. raise ClientError(result["errors"]) prefect.exceptions.ClientError: [{'message': "[{'extensions': {'path': '$', 'code': 'validation-failed'}, 'message': 'no mutations exist'}]", 'locations': [{'line': 2, 'column': 5}], 'path': ['create_tenant'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': "[{'extensions': {'path': '$', 'code': 'validation-failed'}, 'message': 'no mutations exist'}]"}}}]

Danny Vilela

01/10/2022, 5:36 PM

Hi all — what’s the right way to ensure that a flow always runs according to a schedule, even if the current run goes beyond the next run’s starting time? Does Prefect check for this automatically and execute the next run as expected? I’m using an

IntervalSchedule

for daily runs, if that helps.

Koby Kilimnik

01/10/2022, 5:39 PM

how do i make it visible in the run log which parameters my flow used?

Koby Kilimnik

01/10/2022, 5:39 PM

right now the runs have random names

Koby Kilimnik

01/10/2022, 5:40 PM

what if i want to name the flow ?

chelseatroy

01/10/2022, 6:24 PM

Hiya prefect folks! I have the following composition test for a flow that totally works. This composition test literally perfectly describes the flow in question: ``````

Daniel Komisar

01/10/2022, 7:18 PM

We are seeing flows with missing logs. Right now I’m running the same flow from two different projects and the logs do not show up when one from one project. This was working previously and I don’t think anyone changed anything. We’ve also previously seen this in both projects.

Ian Andres Etnyre Mercader

01/10/2022, 7:39 PM

Hi, I'am getting an unpickling error when I use Logstash, details in thread

Miroslav Rác

01/10/2022, 7:47 PM

How does Prefect generate this random but easily readable task run names? it’s very cool feature

Henrietta Salonen

01/10/2022, 9:08 PM

Hey you all, I’m new to Prefect and I’ve managed set up a dev & deployment process for my team to start working with Prefect development in production, however I’m still trying to find a good solution to manage the Agents and automate that part of the deployment as well. The current deployment workflow (done in Github Actions) looks roughly like this: • Build & Push new Docker image (dev/prod) whenever project requirements in the related branch are changed • Build & register flows to Prefect Cloud (dev/prod project) whenever flow files (or default build configs) in the related branch are changed We currently have two Docker Agents running in an AWS EC2 instance. Each agent has its own environment variables and currently, if new env variables need to be added, this is my process: • SSH to EC2 • Start new agents with new environment variables e.g.

Nohup prefect agent docker start --label dev --env VAR1=<variable1> --env VAR2=<variable2> &

• Get the PIDs for the old agents with old environment variables

ps -ef | grep prefect

• Kill the old agents

kill <PID>

As this is not a very smooth approach, I’d like to hear how others have approached automating the Agent part when setting up CI processes. Any tips?

Aric Huang

01/10/2022, 11:30 PM

Hi, I've been using Prefect with KubernetesRun config on GCP for a few months now and it's been working well. Recently we created a new node pool that uses A100 GPUs and they can sometimes take a while to become available (e.g.

The zone 'projects/<project>/zones/<zone>' does not have enough resources available to fulfill the request

, and we've seen them take >10m for the node pool to finish scaling an instance up even when they're available. My understanding is that the Lazarus process will treat a flow that hasn't started running for more than 10m as a failure and trigger another flow run - however this is the behavior i've been seeing: Flow A run -> Zone is out of resources -> Flow B run by Lazarus after 10m -> Flow C run by Lazarus after 10m -> After 30m, Lazarus marks the flow as failed (

A Lazarus process attempted to reschedule this run 3 times without success. Marking as failed.

) -> Pods for flows A-C are still stuck in Pending state on Kubernetes and have to be manually deleted Given this use case would you recommend disabling the Lazarus process altogether for these flows? The ideal behavior for us would be for the flow to wait until an instance can be scaled up, even if it takes a few hours. It would be nice if we could specify a time limit also. Also, is it expected for there to be "zombie" Kubernetes pods/jobs left over in a case like this, and are there any recommended ways to deal with that? I'm not sure what would happen if resources suddenly became available after the Lazarus process stopped all the flows, but before we find them and manually clean them up - would they still run even though the flow has been failed in Prefect? Ideally once a flow is failed we'd like any pending pods/jobs for that flow to be deleted automatically, not sure if that's possible.

Leon Kozlowski

01/10/2022, 11:46 PM

hi all, I’m having an issue with flow registration that I am not sure how to debug. I’ve deployed a new version of my flow, but it seems the new image is not being used I see during my registration below:

Copy code

prefect register --project "<PROJECT_NAME>" --json flow.json --label dev --force
Collecting flows...
Processing 'flow.json':
  Registering '<FLOW_NAME>'... Done
  └── ID: <ID>
  └── Version: 3

But when I run my flow, some of the logic that changed, specifically an output file naming convention is not being used. I’m not sure if this has something to do with an image pull policy in my agent helm chart, or something that I am missing at registration time

Kyle McChesney

01/10/2022, 11:47 PM

Hello All, Wondering if anyone has ran into the following issue: Using an ECS run config, submitting a new flow run fails with a ClientError on the register task definition. I am using a custom yml file in my agent (which is also ECS) to define the parameters for the flow run job def. I recently updated the yml file (increasing the memory size given to the flow container). After inspecting CloudTrail, it seems the

RegisterTaskDefinition

API call is being made with no parameters

Amogh Kulkarni

01/11/2022, 1:21 AM

Hi All, I have been using Prefect for over 6 months and it’s working great. I am using the run_config as Kubernetes and storage as S3. Currently, the developers check-in the code into github. The structure of my github repo is something like this: repo_name ----flows --------flow1.py --------flow2.py ----lib --------etl.py --------reverse-etl.py dockerfile If the developer, checks-in the dag in flows folder, the CI registers the flow to prefect cloud and if the developer checks in a python file in lib folder, the CI builds the docker image and pushes it to AWS ECR. The lib folder is a python module which has the resuable code needed to run the dags. I am now trying to create a seperate micro-service which will only register flows. I realized that there is no way that Prefect allows you to only register a flow without having access to the python file which has the flow and it going over the entire code of the flow file. Is there a way I can register the flow from a remote server?

Anh Nguyen

01/11/2022, 10:43 AM

i using filters.at_time() to setting schedule for Flow. But i can't defind the schedule at time with timezone. How to do that? tks all

Shivam Bhatia

01/11/2022, 10:44 AM

Hi, I am getting this error when running a flow on prefect cloud Can't understand what this error means Any help would be appreciated

Copy code

Failed to retrieve task state with error: ClientError([{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 154, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1798, in get_task_run_info
    result = self.graphql(mutation)  # type: Any
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 569, in graphql
    raise ClientError(result["errors"])
prefect.exceptions.ClientError: [{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}]

Aqib Fayyaz

01/11/2022, 11:02 AM

i am using ubuntu as base image instead prefect in docker file (for storing custom modules) and for that i need to install all the dependencies for prefect which i am trying to do with prefect[all_extras] but but fails with the error

The command '/bin/sh -c pip install "prefect[all_extras]"' returned a non-zero code: 1

Shivam Bhatia

01/11/2022, 11:23 AM

Is it possible to make it such that one task runs after all the other tasks are completed?

Vamsi Reddy

01/11/2022, 1:36 PM

Hi everyone, we are building a flow where we are submitting steps to an EMR cluster. There are a lot of modules to be run. each module is a step. we submit the step and then check for its status. If the step is complete without errors we proceed to submit the next step. All of the modules/steps are in a huge list…….

[module_name1, module_name2,....]

. we are using a for loop to iterate over the list and submit the steps and check the status. However prefect does not submit these steps in sequence/order of the list. How do I make sure that the order is maintained ? Below is a screenshot of the code:

Henrietta Salonen

01/11/2022, 2:18 PM

Hi, I’m trying to calculate some cost estimations for running cloud resources for our ETL pipelines in Prefect Cloud and to benchmark my own estimations I’d be curious to hear (just on a high level) what type of workloads your organization is dealing with and your monthly infrastructure costs. If you are using AWS EC2 instances (my plan is to use these for Docker Agent + DockerRun), I’d love to hear particularly the costs in this area in relation to your task workloads. I’m trying to find a cost efficient solution but want to start simple, although in future we may move more towards a setup described by @Anna Geller in this nice article: https://towardsdatascience.com/how-to-cut-your-aws-ecs-costs-with-fargate-spot-and-prefect-1a1ba5d2e2df Especially if scaling & maintaining the on-demand EC2 setup creates too much overhead

Ben Collier

01/11/2022, 2:23 PM

Just a quick one, we received an email with a summary of our throughput statistics and a champagne emoji - very nice, but I can imagine circumstances in which a client (not us on this occasion) may have to treat that as a security breach. I’m interested in opinions - anyone out there treating their number of tasks executed as confidential information?

Prudhvi Kalakota

01/11/2022, 2:33 PM

Hi everyone, how do I name these tasks so that visually they appear with proper names ``````

Tom Shaffner

01/11/2022, 3:48 PM

I have a case where mapped runs seem to read the cache from other mapped runs. I'm using an approach liked the Complex Mapped Pipelines and have it set up to write a data pull process to a local result, cached for 10 hours, using

_result_=LocalResult(_location_='{plan}_plan_data.prefect')

. In my case my mappings are "plans". The problem though is that in one case a plan/map pulls data and caches it, and then a different plan/map subsequently reads that same data from cache and uses it!! Any idea what might cause this kind of behavior? It causes a bunch of my plan/maps to just not work.

Gabriel Milan

01/11/2022, 4:06 PM

Hi there, has anyone faced something like this before?

Copy code

Unexpected error: TypeError('no default __reduce__ due to non-trivial __cinit__')
Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/opt/venv/lib/python3.9/site-packages/prefect/engine/task_runner.py", line 926, in get_task_run_state
    result = self.result.write(value, **formatting_kwargs)
  File "/opt/venv/lib/python3.9/site-packages/prefect/engine/results/gcs_result.py", line 75, in write
    binary_data = new.serializer.serialize(new.value)
  File "/opt/venv/lib/python3.9/site-packages/prefect/engine/serializers.py", line 73, in serialize
    return cloudpickle.dumps(value)
  File "/opt/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/opt/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
  File "stringsource", line 2, in pymssql._mssql.MSSQLConnection.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

I'm using Prefect 0.15.9 and pymssql 2.2.3. The task that raised it is the following

Copy code

@task
def sql_server_get_connection(server: str, user: str, password: str, database: str):
    """
    Returns a connection to the SQL Server.
    """
    log(f"Connecting to SQL Server: {server}")
    # pylint: disable=E1101
    return pymssql.connect(
        server=server, user=user, password=password, database=database
    )