prefect-community #prefect-community

Hi everyone, my team are using Prefect Cloud and find that secrets are attached to the team, not project. So to divide the secret between environment (dev/stg/prod) we want to create new team for each env. But I can't find the button on the settings-team? How to create a new team?

➕ 2

Jeremy Tee

04/15/2021, 6:58 AM

Hi everybody, I am trying to retrieve results from my child flow in my parent flow. I am currently storing all my child flow results in s3. I am finding it hard to retrieve the location based on the states returned using

client.get_flow_run_info("xxxxx")

. Is there another way for me to get the location on where the task results are stored?

👀 1

Lukas N.

04/15/2021, 11:24 AM

Hello 👋 We're using Prefect server and running our flows with the KubernetesAgent. Sometimes a flow run is running twice in parallel. After a bit of investigation I found this: The first flow run fails the heartbeat so the ZombieKiller retries the flow run (starting the parallel execution). But the first one is still running, it's not dead, it just didn't do the heartbeat because of long blocking operation. Any ideas how to prevent this? I don't even know how the heartbeat system works

Copy code

No heartbeat detected from the remote task; retrying the run.

Hawkar Mahmod

04/15/2021, 12:15 PM

Hey everyone, I am getting the following error:

Copy code

TypeError: can't pickle generator objects

on a task that returns a generator. Now this task is not persisted using a

Result

, there is no Result or checkpointing enabled on the task. When I run locally the flow works just fine. However when I trigger via the Prefect UI, and use S3 Storage it tries to persist all tasks I think. This is what this line in the documentation refers to I believe (see image). How can I get this task to not be persisted by default if that is what is causing this.

Mickael Riani

04/15/2021, 1:51 PM

Hello everyone, I'm trying to find a way to do the task execution distribution on different server (batch and front). I would like my task to run with priority on my front server and if this one is not available I would like to run the task on the batch server. Do you know how I could do it?

Jérémy Trudel

04/15/2021, 2:05 PM

Hey everyone! I'm trying to log some parameter in a prefect flow. After launching a quick run, I can't find any mention of my log on cloud. It looks a bit like this:

Copy code

@task
def extract_copy_history(cursor, schema_table):
  logger = prefect.context.get("logger")
  <http://logger.info|logger.info>(f"Schema table name is {schema_table}.")

Now when I do a quick run on Prefect Cloud, no mention of it appears in my logs despite it being set on "Showing logs for all log levels". I see the log for the task in itself (extract_copy_history) and all other tasks. Just not my custom log.

Satheesh K

04/15/2021, 2:38 PM

Hello everyone! What is the best way to access actual mapped task results, I want to get only part of the result, E.g:

Copy code

intermediate_result = 0
with Flow("flow") as flow:
	param1 = Parameter("list1")
	mapped_list = create_map(param1)
	results = task1.map(mapped_list)
	intermediate_result = results[1]
	results2 = task2(results[1])

Greg Roche

04/15/2021, 4:07 PM

Hi folks, a question about reregistering flows after the flow logic has changed. We have a local agent (running inside a docker container) which executes flows that are stored in S3. Each flow is usually split across multiple Python files, usually with a

main.py

file that imports code from these other files and defines the actual flow logic. If

main.py

is updated, a simple re-registration of the flow seems to be enough to allow the agent to execute the updated flow, because the updated logic is stored on S3 and is then downloaded by the agent during the next execution. However, if one of the other files (which

main.py

imports) is changed, re-registration alone isn't enough to allow the agent to execute the updated flow, seemingly because only the content of

main.py

is stored on S3 at registration. Practically this means that almost every time we make any change any of our flows, we need to rebuild our docker image with the updated logic, redeploy it, and replace the old agent with the new one, before then re-registering the flow. Is there some way for us to register a flow so that all of the flow's code, not just the code in the file that defines the flow, is stored in S3 and we don't need to constantly rebuild and redeploy the agent's image for almost every change? Or is there a cleaner approach to solving this issue which has worked for anybody here? Thanks in advance.

Robin

04/15/2021, 4:48 PM

Dear prefect people, we have some troubles on running our dbt model with prefect. Using dbt run, we are able to run the model locally, however in prefect, we only get the following error

ERROR - prefect.DbtShellTask | Command failed with exit code 1

Any thoughts on what could be wrong or how to get further information and debug the flow are appreciated! 🙂

Joseph Loss

04/15/2021, 7:17 PM

Quick question, has anyone implemented .net tasks using Prefect?

Carter Kwon

04/15/2021, 7:28 PM

Hello, I have a question around the registration and execution of flows. I have an ETL flow that looks something like this

Copy code

<task functions... >

with Flow("ETL Flow", schedule=schedule, storage=Docker(registry_url=os.getenv("REGISTRY_URL"), image_name=os.getenv("IMAGE_NAME")), run_config=ECSRun(task_role_arn=os.getenv("TASK_ROLE_ARN"), execution_role_arn=os.getenv("EXECUTION_ROLE_ARN"))) as flow:
    DAYS_AGO = 5
    TARGET_DATE = (datetime.now() - timedelta(days=DAYS_AGO)).strftime('%Y-%m-%d')

    <use TARGET_DATE to make API calls inside tasks... >

We have a CI/CD process in place that registers our flows after they've been pushed to git. For this particular flow,

TARGET_DATE

should equal

today's date - 5 days

because the API needs a few days for the analytics to be available. I've noticed that

TARGET_DATE

actually ends up being

date of flow registration - 5 days

. Is there a way to have this code executed every time the flow is run instead of once at registration so

TARGET_DATE

changes every day?

Julio Venegas

04/15/2021, 7:36 PM

Hi community! I’m creating my own instance of a Task class that returns multiple values, I added the Tuple return-type annotation but I’m still getting the error `TypeError: Task is not iterable. If your task returns multiple results, pass

nout

to the task decorator/constructor, or provide a

Tuple

return-type annotation to your task.` when I instantiate the Task with nout=2. Class in the thread. Any suggestions?

Ryan Baker

04/15/2021, 9:45 PM

If I build my own docker image for use with a prefect flow in prefect cloud, my experience has been that if I run the flow, then update the docker image, then run the flow again, it does not pull the new image, but is caching the previous image and using that. Is there a way I can clear the cache? Or am I forced to specify a new docker image in the run configuration, such as with a git-hash tag on the image?

jack

04/15/2021, 10:25 PM

Hey all! We were testing out our local prefect agent spinned up from our EC2 instance and we were able to run most flows except for one type - flows registered with Docker storage type. I believe docker flows aren't supported by local agents... Are there any good alternatives to Docker storage type flows for containerized flow storage method that could be used to more sophisticated flows? We would like to use the local agent for most things but don't want to move to a fargate agent or something anytime soon before we try other flow storage method.

Vincent

04/16/2021, 1:44 AM

Hi All, I was wondering is someone could help me identify why some of my tasks are pending. I have the following flow running on prefect cloud with a dask backend. for some reason, the task scheduler has not started 2/4 of the tasks. thanks for any advice

Jeremy Tee

04/16/2021, 4:57 AM

Hi people, I am wondering how does everyone organize their code when defining a flow? Initially my intentions were to split task and flows each in a file, however when i save my flow into s3 and run it from an agent, it is not able to find the location of the "task" file! Thanks in advance!

Matthew Alhonte

04/16/2021, 5:10 AM

Would this affect Prefect? It uses type annotations at runtime, right? https://github.com/samuelcolvin/pydantic/issues/2678

👀 2

James Gibbard

04/16/2021, 2:11 PM

When registering a flow and it being stored in an S3 bucket, is it possible to use a different AWS_PROFILE for deploying, to the the one it uses when executed by Cloud? My machine uses a "production" profile to access the aws account, but inside AWS the profile is "default". Any ideas? Thanks.

Joseph Loss

04/16/2021, 3:39 PM

can someone please explain to me the changes implemented with service accounts? Do I auth login under my user 'joe' or a user 'scheduler' that I will have deployed on multiple servers. Does each one of these need a separate runner token?

Julio Venegas

04/16/2021, 5:25 PM

Hi community! I have a question regarding how and when environment variables are pulled, and whether I need to pass environment variables to an agent or not. If I have the following python script for a flow

Copy code

import os
from prefect import task

@task
def get_env():
    return os.environ.get("CURRENT_ENV")

with Flow(name="get-env") as flow:
    env = get_env()

flow.register(project="get_env")

and I have “CURRENT_ENV” in my bash/zsh environment variables, and run the flow in with a LocalAgent, then it’s not necessary to pass any environment variables when I execute

prefect agent local start

because the environment variable is already in the local system. But if wanted to run the flow in a non-local environment, say in a Dask cluster in Kubernetes, then I would need to pass environment variables to

prefect agent kubernetes start

Peter Peter

04/16/2021, 5:37 PM

Hello, Trying to work with Great Expectations Task and am having issues. What version of Great Expectations does prefect work with? Wondering if it is a version issue since people were getting the same error between versions of GE tutorials. Error I am getting is great_expectations.exceptions.exceptions.DataContextError: No validation operator

action_list_operator

was found in your project. Please verify this in your great_expectations.yml

Marc Lipoff

04/16/2021, 5:41 PM

I'm running into a bit of a catch 22. I am trying to set up a CI process that registers the flows. Here is an example

Copy code

import pandas as pd 
from prefect.storage.docker import Docker

# ... task definitions 

with Flow('test_flow', storage= Docker(
            registry_url=ecr_registry_url,
            image_name=a_repo_name,
            python_dependencies=["python==1.2"])
) as flow:
   
    # ... all the steps

I then to to execute

prefect build -p path/to/file.py

and it throws an error that pandas is not installed (which it isnt)

Copy code

ModuleNotFoundError: No module named 'pandas'

Is there a way to register a flow, without having to install the flow's dependencies first?

Sean Talia

04/16/2021, 7:07 PM

hi all – I have a question about best practices around using secrets. I'm migrating a script to Prefect that has depended on a sensitive value being stored as an environment variable. Initially I was thinking to just using a

Secret

to set this sensitive value as one of the run config's environment variables, e.g.

env = { 'SECRET_KEY' : Secret("SECRET_VALUE").get() }

, but I'm wondering if this raises some kind of security issue when registering the flow to my Cloud instance. In this setup, would this

Secret

value be retrieved at the time of registration and then sent to Prefect Cloud as a part of the flow's metadata? Or would Prefect know that this env variable should be brought into the run config container only at flow runtime, and it's perfectly safe to do something like this?

Cab Maddux

04/16/2021, 7:49 PM

Hi! I have a flow running on preemptible nodes on GKE, which looks to have been preempted and subsequently caught by the Zombie Killer. My flow was then marked as failed but I expected the Lazarus process to pick it up and restart the flow after 10 minutes. As seen in the screenshot the Zombie killer fails the flow at 13:48 and then nothing until I manually restart the flow via the Prefect Cloud UI at 14:45 about an hour later. I confirmed that Lazarus is enabled for this flow. Is there anything else I need to do to have Lazarus pick this up?

Tihomir Dimov

04/16/2021, 8:11 PM

Hi all, We have a flow which accepts as input string array and for each item 3 Tasks (get, map, save) are executed. Until now we are using

flow.environment = LocalEnvironment(executor=LocalDaskExecutor(scheduler="threads", num_workers=num_workers))

to achieve the following execution example: get['string1'], map['string1'], save['string1'] -> get['string2'], map['string2'], save['string2'] -> get['string3']..., but we experience some issues with the LocalDaskExecutor, therefore we want to use DaskExecutor instead, but we struggle configuring it to achieve the same result. Currently we use

flow.executor = DaskExecutor()

and the tasks run like this: get['string1'], get['string2'], get['string3'] -> map['string1'], map['string2'], map['string3'] -> save['string1']..., which is not recourse-effective. How can we configure the DaskExecutor to achieve the first execution example?

👍 1

Adam Lewis

04/16/2021, 10:38 PM

Hi, I'm using the

DaskExecutor

with

dask-kubernetes

to spin up a dask cluster when a flow starts and running it to process 5,000 files via a mapped task with a few final aggregation tasks. I sometimes see (via the Prefect UI) that the dask cluster appears to spin down near the end of the task, but before it's completely done leaving a few tasks stuck pending with no workers to process them. Has anyone seen this before? If so, how did you solve it?

Sean Harkins

04/17/2021, 1:43 AM

I am investigating how we can aggregate module level logs from our Dask workers in Prefect Cloud. I had discussed this a bit with @Jim Crist-Harif in this thread https://prefect-community.slack.com/archives/CL09KU1K7/p1617040250140900?thread_ts=1616957545.069800&cid=CL09KU1K7 where he mentioned that configuring this was somewhat complex. The extra loggers documentation section https://docs.prefect.io/core/concepts/logging.html#extra-loggers demonstrates how to configure module level loggers to use Prefect’s

CloudHandler

via environment settings. I used the appropriate environment settings in my

run_config

with

Copy code

flow.run_config = ECSRun(
            image=worker_image,
            labels=["dask_test"],
            task_definition=definition,
            env={"PREFECT__LOGGING__EXTRA_LOGGERS": "pangeo_forge.recipe"}
        )

I have also configured the task’s regular Python logging with

logging.getLogger("pangeo_forge.recipe").setLevel(level=logging.DEBUG)

and this logger is successfully writing to my Cloudwatch logs for the Dask worker. But none of this module’s log entries are being written to Prefect Cloud. Any suggestions on how I should configure things so that my Dask workers’ log streams are also written to Cloud?

Julio Venegas

04/17/2021, 4:09 PM

Hey! How can I properly report a bug? I used the new

Copy code

prefect register --project PROJECT_NAME --path PATH/TO/DIR_WITH_N_FLOWS

and even though all flows in the dir are registered and showing in the project, when I try to run them, the agent does not pick them up. If I register them individually with

Copy code

cd PATH/TO/DIR_WITH_N_FLOWS
python flow1.py
python flow2.py

and I try to run them, they work i.e. the agent picks them up. Where do I get the appropriate logs and where do I share what I find?

Hawkar Mahmod

04/17/2021, 7:08 PM

If I wish to inject values into my flow at run time from the environment (env vars) where should this occur? Should it be at the start of my flow using a Context context manager? These are values such as the application env (prod, dev, staging) and particular business logic values that are not secrets. I tried to create a user defined config object and then have my tasks read from them but then realised that this object was being created at flow build time and so the values were being frozen to what I had either locally when I registered my flow or in my build environment.

Aurélien Vallée

04/18/2021, 6:00 AM

Is there a way to dynamically change the properties of the flow executor? I have a specific situation where I run say 30 tasks in parallel (using mapping) to retrieve data, but the data provider can reply that the API is overused in which case I need to reduce the amount of workers in the dask executor to 1, otherwise the provider will ban the IP. I could handle that somehow internally to the download task with threading.Lock, but I guess that would only work for the threading local dask executor. Can it be handled at the prefect flow or task level?