prefect-community #prefect-community

Hello, Is Prefect a good choice for non-Python based workloads? Examples: generic workload management with classic SQL Server, Oracle, Kafka and Informatica based batch and streaming data processing jobs.

Frederick Thomas

05/10/2022, 8:22 PM

Hi all, We are currently using Prefect version 0.14.17 in both our development and production environments, can we safely migrate to 1.0.2? Thanks!!

Paco Ibañez

05/10/2022, 8:31 PM

Hello! V2.0 question: I am creating flow runs without a deployment from the API. I can see the flow run being created in the database but my agent will not pick it up. The moment I add a deployment id to the flow run in the db, the agent starts working on it. Is this expected behavior? Thanks!

Scott McCallen

05/10/2022, 8:34 PM

Hello! I have a question regarding executing flows using the Kubernetes Agent. I have created a Job template for the flow run container following the example(s) provided in KubernetesRun documentation. My understanding is that the Job template is then encoded in the flow metadata registered with the Prefect server when I run the

prefect register

command. There are a couple of scenarios where I want to change aspects of the flow run container based on Flow Parameters. For instance, I'd like the flow container to have additional Job annotations in some runs of the flow, but not all runs of the flow. I'm not sure how to do this dynamically when the flow is started. The only thing I've been able to figure out so far is that I need multiple registrations of the same flow - each registration having a different Kubernetes Job template. Is there a dynamic way to adjust/modify the Job template for a given registered flow?

Michelle Brochmann

05/10/2022, 8:48 PM

Questions about UI things in Prefect 2.0(b3): 1. Timeline (Image attached): a. I’m noticing I can’t increase the timestep size (i.e. shrink the flow) in the timeline graph. I have to scroll to see from one end to the other. Is there a way to do this? I’d like to be able to ~~visit~~ view the whole flow timeline on my screen. b. I can’t click on one of the green bars in the flow and get info about the task like I could in 1.2. Is there a way to do this? 2. Radar view: a. I get that 2.0 moves to a no-DAG approach but I’m finding this view a bit harder to look at. Perhaps I just need to get used to it but I’m wondering if there are plans to create something a bit more intuitive for people used to looking at DAG flow graphs. Is there some documentation I could look at that would help me understand this choice and justify that it is a better approach? 3. I find if I want to copy paste the URL for a particular view, e.g. http://127.0.0.1:4200/flow-run/9041b7c8-9439-4657-938e-e3c4fccc0bfc/timeline into another tab (or maybe if I wanted to share it with someone else), I’ll just see

{"detail":"Not Found"}

instead of the view I wanted to look at. It would be very nice if I could take advantage of a url to quickly get to the place I wanted - like if I wanted to remove the “timeline” part to just get to the view for the specific run. Thanks! 🙂

Andrew Lawlor

05/10/2022, 10:43 PM

im seeing flows fail with

Copy code

Failed to retrieve task state with error: ClientError([{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/engine/cloud/task_runner.py", line 154, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.9/site-packages/prefect/client/client.py", line 1479, in get_task_run_info
    result = self.graphql(mutation)  # type: Any
  File "/usr/local/lib/python3.9/site-packages/prefect/client/client.py", line 473, in graphql
    raise ClientError(result["errors"])
prefect.exceptions.ClientError: [{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}]

before running any tasks

✅ 1

Dekel R

05/11/2022, 7:50 AM

Hey all, 2 flows that are running everyday for the last couple of months returned this error yesterday:

Copy code

Exception raised while calling state handlers: ClientError([{'path': ['secret_value'], 'message': 'An unknown error occurred.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/client/secrets.py", line 137, in get
    value = secrets[self.name]
KeyError: 'SLACK_WEBHOOK_URL'

I get this error randomly once in a while - anyone have the same issue? I didn’t reregister or touched the 2 flows for at least 2 months now. Thanks

tas

05/11/2022, 8:10 AM

Hi team, I have just started exploring prefect 2.0 and loving it so far. I am still trying to get my head around the concepts but getting there 🙂. I had a question around launching a pyspark job on GCP serverless dataproc. I know that you need an agent to subscribe to the queue in order to pull the job but in this case do I spin up CE and start an agent process that then will launch a dataproc serverless job? Or do I need a custom task runner that does that for me? It doesn't make sense to me as again I have not got my head entirely around the concepts

🎉 1

Florian Guily

05/11/2022, 9:29 AM

Hey guys, i have a new use case and i wanted to know if prefect was well suited for this. i have a web app that will request some data processing to the backend via an internal api. To process this data, i want my internal api to trigger a flow run from the graphql api. This flow run would be dynamically build depending on the arguments it receives and it's task is to process the data given in input. I'm sure that prefect can do that. The thing i'm less sure is the way of retrieving those results. If i understood correctly, we can't get results of a flow from the graphql API. I have an idea for now but i'm completely open for other ideas: • make the flow write the results in a temp db. Those results would be associated to an id provided as a flow parameter. Questions: How to notify that the flow succeded ? I know there is a flow_run_view mutation in the graphql api, is looping on this call until there is a "failed" or "success" status ok ?

Thomas Opsomer

05/11/2022, 10:52 AM

Hello Prefect Community 🙂 I'm hitting this issue https://github.com/PrefectHQ/prefect/issues/5050, which sadly hasn't been answered yet 😕 - It's an error when uploading the flow to GCS with the option

stored_as_script=True

. Any idea how to overcome this ?

Klemen Strojan

05/11/2022, 11:34 AM

Hey folks! Kudos to Jean for this guide: https://www.prefect.io/blog/deploying-prefect-flows-with-github-actions/?utm_campaign=community I wonder, are you planning any follow ups or additional examples for more complex workflows? I am trying to automate registering a flow using a

KubernetesRun

and

Docker

storage, something people using Prefect in production might find interesting. If there are any other posts or documentation on this topic I would appreciate if you can point me there.

👋 2

💯 1

Jeremy Savage

05/11/2022, 12:26 PM

Hi guys, just a quick question. I see that tasks have a field called max retries - I have a task which I wish to add a certain number of max retries to. Can I add max_retries as below:

Copy code

@task(max_retries=200)
def some_task():
    if helpers.test_if_job_done() is False:
        raise RETRY(
            "Work not done yet, retrying in 20 seconds.",
            start_time=pendulum.now().add(seconds=20),
        )

TIA

Patrick Koch

05/11/2022, 1:56 PM

Dear #prefect-community! A few weeks before I've published a blog post here, https://www.patrickkoch.dev/posts/post_15/ in which I've run Prefect Flows as Kubernetes Jobs. For that, I've used a Windows Container Workload. I've started as newbie with regard to Prefect, currently I've had to try several other use cases - but with regard to those one mentioned in my blog post, I've recognized some behaviour which I didn't expect: After triggering the Prefect Flow, the Workload at the Kubernetes Cluster is deployed successfully, but sometimes I get following warning:

Jessica Smith

05/11/2022, 3:57 PM

Are Automations stable enough to use in Production? I just created my first one, an Automation to send an email when any flow enters a failed state. The Automation doesn't show up in the Automations tab on the dashboard or on any flows, it only shows up in the "teams/actions" page, where it can't be edited.

Jason

05/11/2022, 3:58 PM

Is it possible to map a task across a list of tuples, for example, containing (data_frame, string_name)?

Jessica Smith

05/11/2022, 4:18 PM

The Automations page states that you can set the subject on an EmailNotificationAction. I can't see a way to do this in the UI, am I missing something or is the documentation incorrect?

Binoy Shah

05/11/2022, 4:19 PM

Hi, We are starting our Data Workflows Journey and I am POC’ing various Options with Dagster and Prefect We have stable infrastructure with following support 1. Kubernetes 1.18 on AWS 2. Harbor Registry for Docker Images and Helm Charts 3. AWS S3 Storage 4. New Relic for Observability. 5. Jenkins as CI/CD Pipelines, all wired to deploy Docker and Helm charts I am comfortable building/deploying it via Helm and pushing user code via docker images I wanted to see how i can put up POC for Prefect 2.0 and what would be the best place to quick start on it.

Mateo Merlo

05/11/2022, 4:31 PM

Hello Community simple smile I have a flow that read CSV files from Google Cloud Storage and create a new table in BigQuery with this info. I created a project in Prefect Cloud and pass the credentials using a Secret variable GCP_CREDENTIALS, following the naming convention that Prefect provides so I don't have to pass this credentials manually or get them in the flow. This is working perfect. But now, I want to create another project in Prefect Cloud to have a Staging Environment and use the same flow (that I will register in Prefect Cloud with a Github Action to point the new project) to get the information from another Bucket in GCS and write in another dataset in BQ (in Google Provider will be another project, so I will need to use another Service Account Key). Is there a way to define Secrets by project in Prefect Cloud? is that is not the case, which is the best solution to approach this situation? Thanks!

Arthur Jacquemart

05/11/2022, 4:36 PM

Hi everyone ! Do you know if it is possible to update the value of a prefect parameter during a flow? I have a flow which is checking if a new file has been uploaded in a blob storage and i want to store the metadata of the previous file in prefect, and update these as soon as their is a new file available. I cannot find an example of it ! Thanks

Kendal Burkhart

05/11/2022, 6:28 PM

Hello. I have an issue I hope someone can shine some light on. I am using the Docker storage option for Prefect, and trying to set the base_image. The run environment is AWS ECS/FARGATE, and Python 3.8. The goal is to use a custom image stored in AWS ECR:

Copy code

storage = Docker(registry_url=os.getenv("REGISTRY_URL"),
  image_name=os.getenv("IMAGE_NAME"),
  base_image="<http://nnnnnnnnnnn.dkr.ecr.us-west-2.amazonaws.com/flow-base-image:latest|nnnnnnnnnnn.dkr.ecr.us-west-2.amazonaws.com/flow-base-image:latest>"
  )

and built from a very simple Dockerfile:

Copy code

FROM prefecthq/prefect:latest-python3.8
ENV PYTHONPATH=$PYTHONPATH:/
COPY ./utilities utilities

Currently, this setup fails. When the flow runs in ECS, it exits immediately with an error: Exit Code 1 Command ["/bin/sh","-c","prefect execute flow-run"] As part of my troubleshooting, I bypassed using the image in ECR and set the base image in the Docker storage for the flow:

Copy code

storage = Docker(registry_url=os.getenv("REGISTRY_URL"),
  image_name=os.getenv("IMAGE_NAME"),
  base_image="prefecthq/prefect:latest-python3.8"
  )

This too failed with the same error. I reviewed the logs for builds that did not set the base image, and saw this image being used by default: prefecthq/prefect:0.15.4-python3.8 I then used this image in my Docker storage:

Copy code

storage = Docker(registry_url=os.getenv("REGISTRY_URL"), 
  image_name=os.getenv("IMAGE_NAME"),
  base_image="prefecthq/prefect:0.15.4-python3.8"
  )

This flow runs successfully. Updating my Dockerfile and using that build from ECR also works. So…does anyone have any idea as to why using prefecthq/prefect:latest-python3.8 fails? I would prefer not to pin the version in my Dockerfile.

Chris Reuter

05/11/2022, 6:55 PM

Starting in 5 minutes - see you on PrefectLive https://prefect-community.slack.com/archives/C036FRC4KMW/p1652276065377929

👍 1

marvin 2

Jason

05/11/2022, 7:43 PM

If a Parameter is a boolean, can I use this to trigger optional flows, to allow UI control for options? Something like:

Copy code

save_s3 = Parameter(...)

if save_s3:
    load_s3(dataset)

Bob Colner

05/11/2022, 9:45 PM

So I’m migrating some 1.0 flows to 2.0. loving the new @flow API -loops and native control flow are very nice to have. I’m having a strange task PicklingError error with the dask task runner. The same task works on 1.0 and (this is strange) was working on 2.0 before I tried to setup a deployment. Any ideas on how to debug this task? (see thread)

discourse 1

Jessica Smith

05/12/2022, 3:07 AM

Can flow schedules have names? I have one flow that has 4 schedules (two different clocks and two sets of parameters) and there doesn't seem to be a way to differentiate them.

Cole Murray

05/12/2022, 4:54 AM

Hi All, Running into an issue with a task exiting due to a bad exit code and stays stuck in running (using Orion).

Copy code

21:24:17.733 | ERROR   | prefect.flow_runner.subprocess - Subprocess for flow run '3919d4eb-17ed-4847-bca9-577d8d31d702' exited with bad code: -6

I see this emitted in the agents log, but the task stays stuck in running. Thoughts?

Danilo Drobac

05/12/2022, 7:25 AM

Hi all Coming from an Airflow background and trying to make sense of how a full deployment looks for Prefect. In Airflow we have VM's that run the different services webserver and scheduler etc... In Prefect (2.0 I'm looking at), I have a Prefect Cloud account that runs the UI (the orion server), if I'm looking for deployment, do I need a VM that connects to the this via service-account (or API key?) and creates the work queues + agents? Additionally, on top of that, what is the recommended CI/CD development cycle for when somebody creates new flows + pushes them to a repo?

Jonathan Mathews

05/12/2022, 9:17 AM

Hi all! Sometimes I’m a bit confused how to set task args when using built-in dbt tasks (such as dbt), Is this the correct syntax for setting a tag on the task? (reason I ask is i can’t see it in prefect cloud when I click into the task):

Copy code

dbt_run = dbt(
        command=dbt_run_command,
        task_args={"name": "dbt run", "tags": ["dbt-limit-1"]},
        upstream_tasks=[dbt_deps],
        dbt_kwargs=snowflake_credentials
    )

Alexis He

05/12/2022, 9:23 AM

Hello, I am evaluating Prefect Orion for the purpose of running workflows of containers in a Kubernetes cluster; in particular, I intend to mount a volume for the containers to access (think k8s persistent volume). What are my options? • As far as I have seen, KubernetesFlowRunner does not allow me that level of customisation. • I think Prefect 1.0 allows this use case Thanks!

Felix Horvat

05/12/2022, 9:49 AM

i want to use PostgresExecute - how do i pass the password to this task? it only seems to be possible through the run method?

✅ 1

Oliver Mannion

05/12/2022, 11:33 AM

Hiya when using Prefect Cloud we've experienced more than once now a

No heartbeat detected

reported by the Zombie Killer, but we can't see the task state set to failed. Has anyone else experienced this?