prefect-community #prefect-community

👋 Hello, everyone! I’m new to Prefect 2.0 and I’m trying to figure out the best way to tackle my (likely unconventional) data pipeline. Briefly, I have a pipeline for processing the files in a manifest, and the number of manifests will grow over time. Each manifest should be processed only once unless it’s updated. I’m wondering how to best handle the dynamic nature of my inputs (i.e. the file manifests) and limit the processing of each manifest to once per update. 🧵 I can elaborate a bit more in the thread.

✅ 1

Viet Nguyen

08/05/2022, 6:15 PM

So I have my NetCDF files to Zarr pipeline orchestrated by Prefect to work smoothly, from firing up a temporary Fargate cluster until shutting down the EC2 instance , but I have one question I wonder

✅ 1

Rajvir Jhawar

08/05/2022, 6:17 PM

@Anna Geller is there any update on this request. I took at discourse and i didn't see any topics related to it, maybe I missed them. I have very similar use case to it.

✅ 1

John Kang

08/05/2022, 7:25 PM

Question, I'm trying to debug one of my functions that I've decorated with the task decorator, but I have the below error. I try to call the task with

task.fn(function_to_call())

but that doesn't work as I get this error:

AttributeError: 'function' object has no attribute 'fn'

`RuntimeError: Tasks cannot be run outside of a flow. To call the underlying task function outside of a flow use

task.fn()

✅ 1

Andrew Richards

08/05/2022, 7:37 PM

Is there a way to perform retries with prefect-shell tasks? I'm using prefect version 2.0.2 and prefect-shell version 0.1.1. Supplying the

retries

parameter to the flow itself doesn't appear to work when I deliberately supply a bad shell command.

✅ 1

Javier Ochoa

08/05/2022, 7:39 PM

Hello, I have a problem, I am using prefect with python and cloud environment. when I try to register workflows to Prefect Cloud with AWS s3 method, they are "registered" but in prefect, the version does not change (it keeps the version at 1 even when I registered 3 times): This is causing issues with code sync, meaning that the agent has the newest code, but the flow runs a different version or something

Copy code

flow.storage = S3(
   bucket=DEPLOYMENT_BUCKET, stored_as_script=False, add_default_labels=False
)
flow.register(
   PROJECT_NAME,
   add_default_labels=False,
   idempotency_key=flow.serialized_hash(),
)

✅ 1

Bruno Grande

08/05/2022, 8:29 PM

Should there be a

.submit

after my selection in the attached screenshot? This comes up in the docs here. I thought you needed to use

.submit()

in order to obtain a future. Just wanted to check if this is a typo.

Corris Randall

08/05/2022, 8:41 PM

So since 2.0 was released, I thought I’d play around with it a little more seriously…. first question… Can we write our own NotificationBlock implementations? or, are there any instructions or examples? I made a test one called “my-email”, but I get an error “No class found for dispatch key ‘my-email’ in registry for type ‘Block’.” when it triggers ( I was able to add the block, and add a notification using that block, but when it fires, that’s the error I get. I register the block with prefect block register --file myemail.py then add [“notify”] to the block_document row manually in the db.

Copy code

from typing import Optional
from prefect.utilities.asyncutils import sync_compatible
from prefect.blocks.notifications import NotificationBlock

class MyEmail(NotificationBlock):

    _block_type_name = "My Email"
    _block_type_slug = "my-email"
    _block_schema_capabilities = ["notify"]
    
    @sync_compatible
    async def notify(self,body: str,subject: Optional[str] = None):
        await print( f"In my email notify subject: {subject}\nbody: {body}" )

Kevin Grismore

08/05/2022, 9:18 PM

having trouble running my gcs-stored flows on kubernetes. I feel like it probably has something to do with how my project is structured:

Copy code

- project
        └── flows
            └── flow1.py
            └── flow2.py
        └── util
            └── util.py

if I do

some/dir/project> prefect deployment build flows/flow1.py:flow_func -n my-flow -ib kubernetes-job/my-job -sb gcs/my-bucket -t k8s

everything in src ends up in my bucket as expected, but when I run the flow I get:

Copy code

FileNotFoundError: [Errno 2] No such file or directory: '/opt/prefect/flows/flow1.py'

Keith

08/06/2022, 12:59 AM

Have a general question about migrating from Prefect 1.0 to 2.0. In 1.0 there was a generic

upstream_tasks

parameter that you could pass to tasks so that each task knew to wait for the previous one to run. Through my reading of the documentation it seems like this is not necessary anymore b/c everything should run like it would in Python so it basically defaults to a sequential executor. Is this the correct logic? Obviously this story changes a bit when adding in the different

Task Runners

but just wanted to confirm that using default code blocks that tasks run in sequence and won't run the next task until the previous one is complete.

✅ 3

Benoit Chabord

08/06/2022, 7:17 AM

Hey team, I am doing an RFP for a big company and I am going to use Prefect for the system integration. Is there any resources already existing for this kind of documents? (I am already referencing the case study page) Executive summary, list of clients, key features from a business point of view. I am writing my own from scratch of it but if there is already something existing that would be greatly appreciated.

✅ 1

Jan Domanski

08/06/2022, 10:24 AM

Hi there, i’m having some issues with flow deployments, with an S3 block. My prefect agent pickups the flow run, starts the flow run but fails to get the flow

Copy code

10:21:24.290 | INFO    | prefect.agent - Submitting flow run 'cfc4f262-4f05-4685-882e-364192297107'
10:21:24.474 | INFO    | prefect.infrastructure.process - Opening process 'blond-mammoth'...
10:21:24.482 | INFO    | prefect.agent - Completed submission of flow run 'cfc4f262-4f05-4685-882e-364192297107'
10:21:27.334 | ERROR   | Flow run 'blond-mammoth' - Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/prefect/engine.py", line 247, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/prefect/client.py", line 104, in with_injected_client
    return await fn(*args, **kwargs)
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/prefect/deployments.py", line 47, in load_flow_from_flow_run
    await storage_block.get_directory(from_path=None, local_path=".")
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/prefect/filesystems.py", line 373, in get_directory
    return await self.filesystem.get_directory(
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/prefect/filesystems.py", line 251, in get_directory
    return self.filesystem.get(from_path, local_path, recursive=True)
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/fsspec/spec.py", line 801, in get
    self.get_file(rpath, lpath, **kwargs)
  File "/opt/micromamba/envs/main/lib/python3.8/site-packages/fsspec/spec.py", line 769, in get_file
    outfile = open(lpath, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp3er_ugnvprefect/S3-BUCKET-NAME/alpha/flow.py'
10:21:27.727 | INFO    | prefect.infrastructure.process - Process 'blond-mammoth' exited cleanly.
... 
$ aws s3 ls <s3://S3-BUCKET-NAME/alpha/>
2022-08-06 10:20:49       6473 flow.py
2022-08-06 10:20:49       3204 example_flow-manifest.json

Created via

Copy code

# prefect deployment build ./flow.py:example_flow --name example-flow-alpha --tag alpha --storage-block s3/S3-BUCKET-NAME
# prefect deployment apply example-flow-alpha.yaml

Had mixed luck reading and searching similar posts with this error message

✅ 1

Rio McMahon

08/06/2022, 11:53 PM

Hi there - I am trying to implement a recursive flow pattern in prefect 2.0; any tips on how to do this? I’ve seen some methods that leverage async in the code contest submissions but am curious if it is possible to implement it without async? I’ve included my attempts in the comments.

Yardena Meymann

08/07/2022, 7:09 AM

Hi, I am using Prefect 1.2.1, how can I obtain the location of the result of the previous task (that uses GCSResult) - I want to pass the location of the data, not the data itself to the next task?

✅ 1

Viet Nguyen

08/07/2022, 1:57 PM

Not sure if it's just me, but I find it's very disorganising when subflows showing up the same level with main flows in "Flows" section in the UI, it's just like sub-folders listing at the same level with root folder. My main flow creates multiple subflows, everytime main flow runs the UI gets ugly. It would be great if the UI just displays main flow, then there's an option to show its subflows. When I need to delete a main flow, all its subflows will be deleted too, rather than deleting main flow, then subflows, one by one. Something like this ...

✅ 1

Hafsa Junaid

08/07/2022, 8:48 PM

How can we create block from python code? for prefect 2.0 UI

Rajvir Jhawar

08/08/2022, 2:28 AM

Does one have the ability to add a description to a flow? None of the API calls give you the ability to add a description to a flow. Even in the UI the flow page is essentially just a hyperlink to the deployment page. Are there any restrictions based on the style of doc strings used?

👀 1

Felix Sonntag

08/08/2022, 7:46 AM

Hey, I was wondering on how to deal/set more complex parameter structures in Prefect Orion. E.g. when I have lists ore nested models, I can’t set them on the UI at all.

✅ 1

Vadym Dytyniak

08/08/2022, 7:49 AM

Hi. We moved from ECS to Kubernetes run and sometimes we see non-descript flow failure. Is it possible to stream container logs and see the reason in the cloud logs?

jaehoon

08/08/2022, 9:01 AM

Hello everybody. I was using the gitlab storage setting in version 1. However, it seems that version 2 is not supported. Do you know how? If there is no way, when will the gitlab storage be updated?

✅ 1

Ha Pham

08/08/2022, 9:11 AM

Hi all, is there any way to specify the path that the manifest file will be generated into?

✅ 1

Ha Pham

08/08/2022, 9:44 AM

I just found out that for a work queue to pick up a flow run, all of the the work queue's tags must present in the deployment. For example • if my work queue has

tag_1

tag_2

, and the deployment only has

tag_1

, it wont be picked up • If I modify the deployment to have both

tag_1

tag_2

, it will be picked up • If I add another

tag_3

to the deployment, the deployment is also picked up Is this the expected behavior?

✅ 1

Iuliia Volkova

08/08/2022, 10:31 AM

If there was any thoughts or ideas to separate prefect.client (to interact with REST Prefect API) from server and whole library? Prefect is a very huge package & if you want only rest api client it is no make sense to download all those dependencies that is important, for example, in lambdas where you care about package size. I mean REST API client usually used in 'users' of Prefect, not in server, and I don't want load all server things with it

✅ 1

Ha Pham

08/08/2022, 10:43 AM

I'm looking into the CI/CD process to deploy new & update workflows. A few questions: • When I update the deployment setup from the UI (like tags, schedules...) where is the updated info stored? Is it possible to sync it back to the original YAML file? • Follow up from the first point. If a deployment with

name_a

already existed with some modified configs, and now I run

deployment apply

on the same original deployment file, I will lose all of the configs. Is there any way to avoid this, or what's the best practice when handling deployments?

Raviraj Dixit

08/08/2022, 10:43 AM

Hi all, Can anyone tell me how to pass python dependence to Github storage? and is there a way we can pass pyproject.toml file to Github storage to install dependencies?

Tim Helfensdörfer

08/08/2022, 11:16 AM

Hey, we are *currently experiencing some performance problems with prefect 2. As soon as we decorate a function with

@flow

it runs slower by a factor of 2-10x. This is our test setup:

Copy code

def run_flow():
    calculate_something()


@flow(
    name=FLOW_NAME,
    task_runner=get_default_task_runner(),
    version=get_file_hash(__file__),
    timeout_seconds=get_default_timeout(),
)
def run_prefect_flow():
    global USE_PREFECT_LOGGER
    USE_PREFECT_LOGGER = True
    run_flow()


if __name__ == "__main__":
    if len(sys.argv) > 1 and sys.argv[1] == "--no-prefect":
        # Normal performance 
        run_flow()
    else:
        # Bad performance
        run_prefect_flow()

We can't share any code from inside

calculate_something

- are there any circumstances that you know of where this might happen? What overhead brings `@flow`into play? Does it analyze http or DB requests for debugging purposes which might explain the performance degradation? What I can offer is a pstats profile/graph in a dm because it also may contain sensitive data. *currently = as long as we can remember using prefect 2.

👏 1

Yury Cheremushkin

08/08/2022, 12:14 PM

Hello! What is a replacement for prefect.signals from 1.0? I used it a lot, especially for skipping unnecessary tasks. Should I do it in a native pythonic way, like passing None as a result of a task to show that there are no need to do something in downstream tasks? The second question is about manually setting downstream/upstream tasks. I was able to do it with

.set_upstream()

method in 1.0. But it looks like there are no more such method. I.e. there are two tasks: loading data into some BigQuery table and then merging it with another table. Obviously these tasks need to be run in exact order. But there are no need to pass any results from the first task to the second, so it wasn’t obvious for Prefect 1.0 that there is some kind of order. That’s why I used

.set_upstream()

. What should i do now? And the third question is about

.map()

for tasks. So, now i should just use standard pythonic map? Will it be parallelized in the same manner as it was in Prefect 1.0 with Concurrent/Parallel task runner?

Chu

08/08/2022, 1:10 PM

Hi, a quick question for registering flow: within the same folder, we have

c_flow.py

which orchestrates

b_flow.py

and

a_flow.py

, when we register all three flows together under this folder, will Prefect know the order of register? (like need to register b_flow and a_flow first, and then register c_flow)

✅ 1

Oscar Björhn

08/08/2022, 1:52 PM

Does anyone have experience using map() to call a function taking a **kwargs parameter? I'm not sure if this is even a Prefect question or more of a Python question, but it's not obvious to me how I can get this to work, if it is at all possible. Specifically, I am trying to call trigger_dbt_cli_command which is part of the prefect-dbt package, on prefect 2.0.3. I'm not sure what other information I should include, since the types of error (or error messages) I get varies depending on how I try to approach the problem. 😅

Felix Sonntag

08/08/2022, 2:11 PM

Hey, I have have two questions about logging in Prefect Orion (already read through the docs 1. Is there also an option to have all logs coming out of Docker, i.e. the Python process in the UI? I know, there’s

PREFECT_LOGGING_EXTRA_LOGGERS

and it works, but for example it does not cover logs, like coming from the Tensorflow CPP logs, as e.g.:

Copy code

2022-08-08 13:59:27.431966: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-08-08 13:59:27.432037: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Plus, it’s kind of tedious collecting all the libs I want to include. 2. When starting a flow run, many valuable logs and especially errors are only visible on the work queue. So if e.g. I misconfigured a block and the deployment run fails to start, an end user cannot see the error message, since the queue logs are not visible. E.g. one would need to check the work queue Kubernetes pod and manually filter out the error logs for yourself. A user might not have access and with many flow runs, this is also hard. Am I missing something, or how should one work with that properly?