prefect-community #prefect-community

I'm having a problem creating a prefect flow to process a large number of xml files (40k) without running out of ram. Is this the right place to ask for help?

Greg Adams

01/14/2022, 9:38 PM

Hi again! Is there a storage/deployment pattern that allows me to include custom modules for my flow at registration time, rather than rebuilding the docker image whenever I want to update them? I thought the Git storage might include the extra python files but it’s not liking it (maybe I’m doing it all wrong?)

Josh

01/14/2022, 11:01 PM

I’m running into a mypy issue with Prefect Tasks. Mypy will product an error

error: <nothing> not callable

for the task run when I try to test it out

Copy code

class MyTask(Task):
    def run(self): 
        # do something
        return True

if __name__ == "__main__":
    my_task = MyTask()
    with Flow("My Flow") as flow:
        my_task()
    flow.run()

Amber Papillon

01/16/2022, 3:04 AM

Hey guys, quick question. Has this been implemented yet? https://github.com/PrefectHQ/prefect/issues/2254

Philipp Eisen

01/16/2022, 11:54 AM

EDIT: This was because my dask runners were not accessing the same orion database. Hey! I was testing to run orion with a DaskCluster that is deployed in kubernetes; I’m starting the flow locally and point to the daskcluster on localhost - (using port-forwarding for the scheduler) When running a local DaskCluster it works fine I’m always getting this error:

Copy code

distributed.worker - WARNING - Compute Failed
Function:  orchestrate_task_run
args:      ()
kwargs:    {'task': <prefect.tasks.Task object at 0x7f02f0f56430>, 'task_run': TaskRun(id=UUID('2acf899f-67f5-4717-9665-c91f730f3719'), created=datetime.datetime(2022, 1, 16, 11, 45, 38, 995585, tzinfo=datetime.timezone.utc), updated=datetime.datetime(2022, 1, 16, 11, 45, 39, 19000, tzinfo=datetime.timezone.utc), name='get-product-b7ee3036-0', flow_run_id=UUID('a3e75090-2e7d-42f6-8dda-b00600f70b12'), task_key='b7ee3036fbe1354fe2fbf30215a316c4', dynamic_key='0', cache_key=None, cache_expiration=None, task_version=None, empirical_policy=TaskRunPolicy(max_retries=10, retry_delay_seconds=0.0), tags=[], state_id=UUID('36aa692f-175d-4bff-81ed-e57f2228cdfa'), task_inputs={}, state_type=StateType.PENDING, run_count=0, expected_start_time=datetime.datetime(2022, 1, 16, 11, 45, 38, 988955, tzinfo=datetime.timezone.utc), next_scheduled_start_time=None, start_time=None, end_time=None, total_run_time=datetime.timedelta(0), estimated_run_time=datetime.timedelta(0), estimated_start_time_delta=datetime.timedelta
Exception: "ValueError('Invalid task run: 2acf899f-67f5-4717-9665-c91f730f3719')"

Is there something obvious I’m missing?

Tao Bian

01/16/2022, 9:39 PM

Hi, I am having a flow scheduled daily run, and I tried to get the timestamp inside the flow, why I got the exact same timestamp written into database every day?

Copy code

@task
def write_timestamp_into_database():
    ...

with Flow("sample-flow", daily_schedule) as flow:
    timestamp = str(datetime.datetime.now())
    write_timestamp_into_database(timestamp)

Sultan Orazbayev

01/17/2022, 12:18 AM

Hello, if anyone is using prefect on a SLURM cluster, I am interested in connecting to learn about the experience.

Noam Gal

01/17/2022, 8:12 AM

Hi all, I'm a newbie to prefect. I have created a flow that uses to

prefect.Parameter

Tasks. The type of the parameters are just native

str

and

int

The flow's logic uses some other tasks that are using the parameter tasks. Those tasks are using some other helper functions that help me reuse code and make the code more readable. f I want that my helper function to use one of the parameter (I just need the value, not the task itself) I need to set it as a prefect task by itself and when calling it from other task it should be called with

.run

since inside the task it isn't in the context of a flow. For example:

import prefect

from prefect import Parameter, Flow, task

with Flow("my flow") as my_flow:

id = Parameter("id", required=True)  # int value

description = Parameter("description", required=True). # str value

result1 = my_task1(id, description)

result2 = my_task2(id, description)

my_reduce_task(result1, result2)

@task

def my_task1(id, description):

val1 = calc_logic_func1()

return shared_task.run(id, val1)

@task

def my_task2(id, description):

val2 = calc_logic_func2()

return shared_task.run(id, val2)

@task

def shared_task(id: int, value:int) -> int:

return ...

def calc_logic_func1() -> int:

return ...

In the example above I want to use a helper function

shared_task

with the integer

id

value but since

id

is a

prefect Parameter Task

, therefore shared_task itself must be a task and calling it from other task (e.g.

my_task1

it should be called with

shared_task.run

. Well, this is how I understand this so far. Is there any other way to use it? (not setting

shared_task

as a task OR not calling it with

.run

since

my_task1

is already called from `my_flow`context) If this is the right way to use it - are there any other effects on the flow run (I guess

my_task1

will execute

shared_task

itself in the same agent) Thanks!

Florian Kühnlenz

01/17/2022, 9:37 AM

Hi, we just had a two flows becoming stuck without any apparent reason. All tasks had been finished but the flow would remain as running, therefore blocking others. Any idea how to debug what was going on? Manually setting the state resolved the problem.

Tom Klein

01/17/2022, 11:49 AM

Hey! 🙋‍♂️ We’re trying to use this (excellent) example: https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows_task_library/s3_kubernetes_run_RunNamespacedJob_and_get_logs.py And we’re missing some permissions on our end for K8s operations - however, I noticed that when i ran this flow --- even though the first step (delete k8s job) failed, it proceeded to perform the next steps (e.g. create job, which it does have permissions for) am i missing something about how this should work? shouldn’t a failure in a task lead to halting the entire flow (by default, without explicitly playing with triggers)?

Marwan Sarieddine

01/17/2022, 1:53 PM

Hi folks, question about the lazarus process. Why would lazarus try to reschedule a flow run if it reaches a successful state ?

Bruno Murino

01/17/2022, 4:29 PM

Hi everyone — I’m trying to pass tags to the ECS Run task, but it doesn’t look like the tags are being propagated. Is this the right way to pass tags to the ECS tasks that the Prefect Agent will create?

Miguel Angel

01/17/2022, 6:00 PM

Hello everyone! Does anyone have worked with dask `futures`within prefect flows context? I've basically want to perfom some future computations in order to parallelized parquet reading and dataframe concatenation. The following snippet shows a MWE using dask futures and client.

Copy code

import dask.dataframe as dd
from dask.distributed import Client
from s3fs import S3FileSystem

s3 = S3FileSystem()
client = Client()
folder_list = [
    "file1",
    "file2",
    "file3",
    "file4",
    "file5",
    "file6",
    "file7",
    "file8",
]
file_list = list(
    map(lambda folder: f"<s3://my-bucket/parquet/{folder}/*.parquet>", folder_list,)
)
dataframe_list = client.map(dd.read_parquet, file_list, gather_statistics=False)

dataframe = client.submit(dd.concat, dataframe_list)

mean_value = client.submit(lambda x: ["some_data_column"].mean(), dataframe)

mean_compute = client.submit(lambda x: x.compute(), mean_value)

print(mean_compute.result())

Andreas Eisenbarth

01/17/2022, 7:27 PM

Hello! I have encountered a very weird behavior and have no more ideas what could cause it. We do batch processing and use

create_flow_run

with

map

to create multiple flows, each with a different dict of parameters. On one server, all created flows receive the same

flow_run_id

, which means they overwrite their logs and we only see one in Prefect UI. (Locally I cannot reproduce it and every child flow has a different flow run ID. This server is running in docker, and in that setup

create_flow_run

was working correctly previously.) Does anyone have ideas? (Example code attached)

Matt Alhonte

01/17/2022, 7:38 PM

This rules so hard. Wanna find a way to include it in my Prefect pipelines. https://github.com/stepchowfun/typical

Samay Kapadia

01/17/2022, 10:46 PM

I’m running into the weirdest error. Trying to make prefect cloud work with my kubernetes cluster. The error says

No module named '/Users/sa/'

. Why does it want my home directory to be a module? More details inside

Yusuf Khan

01/17/2022, 11:03 PM

I have a task failing with following error: Unexpected error: ValueError('ctypes objects containing pointers cannot be pickled') Before making this a prefect flow script it was executing fine. Its a small script running on a rasberry pi using the PiCamera module. There are other non-dependent tasks that are working alright. Any thoughts from anyone? Googling this didn't yield much

✅ 1

Son Nguyen

01/18/2022, 9:30 AM

Hi, I’m launching a new Prefect server with

prefect server start

and everything started correctly. But in the UI, when I click into a flow, it’s not redirected to flow detail page. It looks like the following docker images version introduced a new bug

Copy code

prefecthq/apollo              core-0.15.12   d8519b0544d0   5 days ago     324MB
prefecthq/server              core-0.15.12   d828f40dbf19   5 days ago     403MB
prefecthq/ui                  core-0.15.12   5edd4fee96ed   3 weeks ago    225MB

because it worked fine with this version

Copy code

prefecthq/ui                  core-0.15.11   6fac027b4605   4 weeks ago     225MB
prefecthq/server              core-0.15.11   f6280189d6a5   6 weeks ago     402MB
prefecthq/apollo              core-0.15.11   d1b07b3c9a57   6 weeks ago     324MB

Akharin Sukcharoen

01/18/2022, 9:50 AM

How can I fix the twice run scheduling? It make my server overload. Thank you in advance.

Emma Rizzi

01/18/2022, 9:56 AM

Hi! Do you have any idea of what is causing this error :

Failed to load and execute Flow's environment: FlowStorageError("An error occurred while unpickling the flow:\n  TypeError('an integer is required (got type bytes)')\nThis may be due to one of the following version mismatches between the flow build and execution environments:\n  - prefect: (flow built with '0.15.10', currently running with '0.15.12')\n  - python: (flow built with '3.7.11', currently running with '3.9.9')")

? I search this slack for insights, I use prefect Cloud with a docker agent on a VM, I upgraded prefect to 0.15.12 on both agent and development machine

Malthe Karbo

01/18/2022, 11:03 AM

Hi everyone, I am having some trouble using the DaskExecutor with Fargate mode. I get the following error (after successfully running all flows):

Copy code

RuntimeError: IOLoop is closed

Flow example in thread and pinned versions as well

Aaron Pickering

01/18/2022, 11:45 AM

Hi everyone, I'm trying to use "SnowflakeQueriesFromFile" in a task and I'm getting a strange error. Not sure how to start debugging this, any ideas? Could it be something to do with the file path?

Copy code

"Failed to load and execute Flow's environment: FlowStorageError('An error occurred while unpickling the flow:\n NameError("name \'err\' is not defined")')"

The task itself is straightforward, it looks like this:

Copy code

snowsql_obj = SnowflakeQueriesFromFile(account=SNOWFLAKE_ACCOUNT, user=SNOWFLAKE_USER, password=SNOWFLAKE_PWD, file_path="../../sql/amplitude_raw.sql")
snowsql_obj.run()

Samay Kapadia

01/18/2022, 3:20 PM

Hey all. Where can I find a prefect configuration reference page? Googling “prefect configuration reference” takes me to the core concepts page, which doesn’t actually tell me what configurations I can actually change

👀 1

Konstantin

01/18/2022, 3:39 PM

Hey all, please tell me how to set up the task launch sequence. For example at me one task deletes the data, the second task interposes the data. I need to run the delete first, and after insert data

Johan Wåhlin

01/18/2022, 3:46 PM

hi all, I'm on a self hosted prefect server 15.11 running on AKS using the KubernetesRun() config, trying to create an Dask-cluster to use with the DaskExecutor. My flow works fine with a local cluster, however, as soon as I try to use an external cluster (using KubeCluster or Coiled) the flow stops with a message that "terminal tasks are incomplete" (see image). An hour or so later, the entire process failed with a connection-failed to my graphql-adress. Does this sound familiar to anyone?

Samay Kapadia

01/18/2022, 3:51 PM

Why would I get

The secret KUBERNETES_API_KEY was not found

if I’m running the prefect agent inside the cluster? According to this doc it will attempt an in cluster connection but my hello world task seems to keep failing 😭

Jason Motley

01/18/2022, 3:55 PM

What's the best way to replicate the

max_retries

feature but for an entire flow? I.e. if the flow fails for some reason, retry it 5 minutes later.

Muddassir Shaikh

01/18/2022, 4:04 PM

I have a task function which gets a tuple as input. The task_name for this task is to be derived from another function named _task_name_from_tup_ based on the tuple which it gets: example

Copy code

@task(task_run_name="{task_name_from_tup(details)}", max_retries=3, retry_delay=timedelta(minutes=1))  
def processing(details):
	//some code//

Yusuf Khan

01/18/2022, 5:40 PM

I haven't dug into the API reference docs, but just looking at the high level docs for Flow-of-Flows, is it possible to have a flow-of-flows where FlowA is running on a schedule, and then FlowB only runs after FlowA, whenever that happens to be? edit: and these are in two different environments and two different agents so I can't make them one flow

✅ 1

Frank Oplinger

01/18/2022, 10:05 PM

Is it possible for a prefect flow to spin up a dynamic number of other flows based on the output of a task? I see in these docs some examples of spinning up a static number of flows but I would love to have the number of children flows vary from run to run.