prefect-community #prefect-community

Hi Experts, I created a flow of flows with startFlowRun. Dependencies have been set in the control flow. I have set wait=True in each of the startFlowRun(wait=True, *kwargs). Based on my understand the flow should keep going down to the dependent flows if the upstream flows has been executed successfully. I set the the test in a way that flow3 only depend on flow2 and flow2 only depend on flow1. I was think that flow2 will start execution when flow1 succeed; but it turns out that it did not get going until flow4 finish running. This is not what I was expecting. Can you see what I might have done wrong?

eli

04/01/2021, 11:04 PM

Hi all I'm trying to run a long running task that takes in a local DaskCluster but it seems like something on prefect arbitrarily kills off the worker... My flow uses the Dask ResourceManager looks something like

Copy code

@task(checkpoint=False)
def long_running_dask_task(inputs: dict, client: Client) -> boolean:
  futures: List[Future] = []

  while(True): 
    next = get_next(inputs)
    if not next:
      break
    f = client.submit(func, next)
    futures.append(f)
    
  client.gather(futures)
  return True
  
with Flow('local-dask-flow') as flow:
  with DaskCluster(...) as client:
    long_running_dask_task(param_1)
  flow.executor = LocalExecutor()

https://docs.prefect.io/core/idioms/resource-manager.html#example-creating-a-temporary-dask-cluster

Jonathan Chu

04/02/2021, 12:34 AM

hi guys, how do i control the image name that is used for docker storage? e.g. if i enter a flow name with underscores, it converts it to dashes to get a repo name https://docs.prefect.io/orchestration/flow_config/storage.html#docker

CA Lee

04/02/2021, 1:21 AM

Hello all, has anyone experienced issues with agent health? Long running agent processes have suddenly stopped polling Prefect Cloud

matta

04/02/2021, 1:47 AM

Anyone else had the problem of mapped tasks succeeding with every sub-task but staying in a

mapped

state?

Jonathan Chu

04/02/2021, 1:48 AM

the

RUN

tab with

Docker

doesn't seem to parse the JSON version of the Environment variables correctly seems to keep the wrapping quotes as part of the value

Jeremy Tee

04/02/2021, 7:47 AM

hi everybody, I am currently using

prefect cloud

, and my flow will invoke

aws lambda

and return me with the response. However, after registering my flow, whenever i try to run it, it throws

Unexpected error: TypeError("cannot serialize '_io.BufferedReader' object")

Is there a workaround on this?

Copy code

@task(name="invoke_lambda")
def invoke_lambda(function_name, table_path, etl_target_date):
    lambda_client = boto3.client("lambda")
    response = lambda_client.invoke(
        FunctionName=function_name,
        Payload=json.dumps({"table_path": table_path, "etl_target_date": etl_target_date}),
    )
    return response


with Flow(
    "test-flow",
    executor=LocalExecutor(),
    run_config=LocalRun(),
    storage=S3(
        bucket="random-bucket",
    ),
) as flow:
    x = invoke_lambda("test", "a/b/c", "2021/04/02")


flow.register(project_name="xxx", labels=["dev"])

Matthew Blau

04/02/2021, 1:34 PM

Hello all, I see that I can set up retry logic with

Copy code

@task(max_retries=3)

but how can I set up the retry logic if I am not setting up tasks with the functional API? I do not see anything in the docs that explains. Thank you in advance!

Marwan Sarieddine

04/02/2021, 4:13 PM

Hi Folks, we are in the process of migrating to make use of a KubernetesRun config and a DaskExecutor, alongside our Kubernetes agent on EKS. We seem to be running into issues running our flows with a custom context from the prefect UI.

👀 1

Nikola Milushev

04/02/2021, 4:34 PM

Hi all, we have a flow which is infinitely running because a mapped task is stuck on "Starting task run...". The log is showing just that, nothing more. The task itself is decorated with

@task(max_retries=6, retry_delay=timedelta(minutes=10), timeout=60)

, however it seems the task run is stuck before it can trigger the retry. May be there another reason this is happening other than the OOM Killer, as seen in a similar topic from 15.03?

Jay Shah

04/03/2021, 11:41 PM

Hi, we are using SQL Server Task - SqlServerExecute to execute a truncate table query (also merge query). We are encountered this error. The documentation suggests that the data field can be optional - https://docs.prefect.io/api/latest/tasks/sql_server.html#sqlserverexecute we are able to execute SqlServerExecuteMany and SqlServerFetch. Can someone help?

Copy code

Unexpected error: TypeError('execute() takes no keyword arguments')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 865, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/executors.py", line 299, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/tasks.py", line 454, in method
    return run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/tasks/sql_server/sql_server.py", line 90, in run
    executed = cursor.execute(query=query, vars=data)
TypeError: execute() takes no keyword arguments

tash lai

04/05/2021, 6:01 AM

Hi all. Can anyone explain how does garbage collection works? like, will the result of

produce

stay in memory until the flow finishes, or will it be removed as soon as

consume

finishes?

Copy code

@task
def produce(url):
    return download_big_json(url)

@task
def consume(big_json):
    do_something(big_json)

with Flow('my_flow') as flow:
    urls = Parameter('urls')
    produced = produce.map(urls)
    consume.map(produced)

Varun Joshi

04/05/2021, 8:20 AM

Hey Prefecters, how to I upgrade from prefect version 0.14.9 to 0.14.10?

Jeremy Tee

04/05/2021, 10:00 AM

hi everybody, just wondering if its possible to search `task_run_id, flow_id`from the prefect UI?

Rob Fowler

04/05/2021, 12:16 PM

is there a way to initialise a custom result handler in a prefect agent run flow? I can, of course, initialise a result handler at build time but now I am all fancy with common containers for test and production I don't want the redis/azure result destination to be hard coded in the docker storage.

haf

04/05/2021, 5:11 PM

Hi there! Does anybody here work with Prefect together with Python notebooks and MLOps/Feature/Model stores?

Brett Naul

04/05/2021, 6:12 PM

curious if anyone has any thoughts on how to avoid this kind of looping behavior when the dask worker is repeatedly killed after running out of memory...is this what version locking is for? I can't remember exactly how that works and it doesn't seem to really be documented anywhere

Joseph Loss

04/05/2021, 6:54 PM

Hey guys, can someone please point me in the direction of importing user-created functions from other python files? We have a large library of common functions that are used in almost every script, I'm developing a use-case for using Prefect over VisualCron but I'm running into some issues here

Tomás Emilio Silva Ebensperger

04/05/2021, 9:43 PM

Any experience/advice creating flows dynamically in one script? I have several client configs in json files and one flow looping around those configs to execute the tasks according to each client. I was thinking of dynamically loop but have many flows created instead. any thoughts?

Zach Hodowanec

04/05/2021, 9:54 PM

Hey team, is it possible to use a Deploy Key instead of a PAT for authentication when using GitHub Storage?

Willian Chan

04/05/2021, 9:59 PM

Hello everyone, I'm stuck in an error when running my flow with helper scripts from

GitLab

The UI tells me:

Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'mail_client'")

. The main problem here is that the file mail_client.py is not present in the agent, and for me it is impracticable to send each auxiliary script to the agent (there is going to be a lot of flows) The structure of the repository:

Copy code

gitlab-repository/
├── flow.py
└── mail_client.py

Inside my flow.py it imports the mail_client:

Copy code

from mail_client import MailClient
...
...

The configuration for

GitLab

storage:

Copy code

flow.storage = GitLab(
    repo="XXXXX",
    host="XXXXX",
    path="flow.py",
    secrets=["GITLAB_ACCESS_TOKEN"]
)

I need the agent to be able to pull the entire repository because there will be many processes being inserted in the prefect and there is no way to change the agent with each modification in a process. does anyone have any solution for this? Thanks

Jonathan Chu

04/06/2021, 12:40 AM

how are flow configurations supposed to be used? https://docs.prefect.io/orchestration/agents/docker.html#flow-configuration these labels and env variables are presumably specific to each agent that i start up, so i can control, say, staging and prod but this looks like something global that's checked in to the codebase that goes with the flow code definition

matta

04/06/2021, 1:51 AM

Hrm, so if I run the

DaskExecutor

in a Jupyrer notebook and I'm using threads (

cluster_kwargs={"processes": False}

) then I see the logs in the notebook. If I take that off though, then the logs disappear. How do I have the logs go to the notebook again?

Wolfgang Steitz

04/06/2021, 9:36 AM

Hey! I've task that is scheduled to run every 4 hours. However sometimes the associated agent isn't running for let's say a day. In that case the pending runs obviously pile up. I'd like to avoid that, especially because there is no point running that many runs the next day. I can imagine 3 ways to implement such behavour, unfortunately I didn't find anything in the docs so far: 1. set a limit of pending runs of a given fllow 2. timeout a pending run after some time 3. add a task to the flow that checks the scheduled time and skips the actual task if above some threshold I assume 1 and 2 are not available. 3 should be possible somehow. Any pointers on how to implement this?

✅ 1

Levi Leal

04/06/2021, 11:04 AM

I'm creating an integration with datadog and I need to add a custom handler to prefect's flows. I need to add a handler that gets all logs from the run and spits it out as json. I've seen a lot of examples like the one bellow, but that's not what I need. I don't want to add a handler to each logger.

Copy code

logger = prefect.context.get('logger')
logger.addHandler(log_handler)

I need something like this

Copy code

log_handler = logging.StreamHandler()
log_handler.setFormatter(DatadogFormatter())
get_logger().addHandler(log_handler)

I add the handler to the 'root' logger and everything is logged the way I need. I've tried the latter and it works fine with

flow.run()

, but when I register the flow I can't get it to work with k8s. More details in the thread

haf

04/06/2021, 1:07 PM

Hi, I'm back at https://prefect-community.slack.com/archives/CL09KU1K7/p1615401726121200?thread_ts=1615327031.065400&cid=CL09KU1K7 trying to make it work; has job templating (being able to add a single annotation) started working?

Marwan Sarieddine

04/06/2021, 2:10 PM

Hi folks, we are getting failures for all of our flow runs after updating to prefect

v0.14.15

basically the failure happens after the flow has completed running at the state handler level - please see the traceback in the thread

Florian Kühnlenz

04/06/2021, 3:00 PM

Hi. I am having some trouble making

prefect register --module

work. My project looks like this:

Copy code

flows
  + __init__.py
  + my_flow.py
  + shared_tasks
    + __init__.py
    + util.py

When I run

prefect register --project 'Prefect Testing' -m '<http://flows.my|flows.my>_flow'

, I get

No module named 'flows'

. What am I missing?

Andor Tóth

04/06/2021, 4:01 PM

Hello. I'm still test driving Prefect (v0.14.15), but my flow stucks and I get zombie processes. Any ideas? Here's the code without imports:

Copy code

SQL_DIR = Path('sql')

@task
def list_query_names():
    return [f.name for f in SQL_DIR.glob('*.sql')]

@task(log_stdout=True, timeout=15, task_run_name='{name}-{date:%F_%T}', checkpoint=False)
def exec_query(name: str):
    sql = Path(SQL_DIR / name).read_text()
    print('Query name: %s' % name)
    engine = sqla.create_engine(DSN)
    rs = engine.execute(sql)
    return dict(keys=rs.keys(), rows=rs.fetchall())

@task
def save_results(rs, name):
    with (OUT_DIR / name).with_suffix('.txt').open('w') as f:
        csv_writer = csv.writer(f, delimiter="\t")
        csv_writer.writerow(rs['keys'])
        csv_writer.writerows(rs['rows'])

with Flow("Queries") as flow:
    query_names = list_query_names()
    results = exec_query.map(query_names)
    save_results.map(results, query_names)
    
flow.executor = LocalDaskExecutor(num_workers=2, schedule='processes')
flow.run()

Robin

04/06/2021, 4:13 PM

To all the adventurous apple M1 users out there (and in general everyone working with different OS):

How do you build your prefect flows to ensure that the docker images run on different/the desired architectures/OS?