prefect-community #prefect-community

Hi every body !!! Is there any way to separete @task in diferent files using github storage? when I try that (new folder, import this file with def with @task decoraction and import of the prefect task …) the output tall me: ModuleNotFoundError: No module named ‘tasks’

Max Watermolen

12/21/2021, 5:35 PM

Howdy, Running into so Django weirdness, anyone seen this?

Copy code

Failed to load and execute Flow's environment: FlowStorageError('An error occurred while unpickling the flow:\n  AppRegistryNotReady("Apps aren\'t loaded yet.")')

Trevor Campbell

12/21/2021, 5:41 PM

Hi folks! Quick Q: what would be the idiomatic way in Prefect Orion to create "make-like" tasks? say, e.g., I have a task that uses a file

a.log

as an input, and produces another file

b.log

as an output, and I want the task to run only when the timestamp of

a.log

is newer than

b.log

(in addition to the usual waiting for predecessor Prefect tasks). I could of course just do this manually inside the task, but was wondering if there was a better way to go about it

Vipul

12/21/2021, 6:45 PM

Hi, quick question on Orion. We are on Prefect Server and doing initial PoC on the Orion and would like to know if there is a way to limit the number of flow or task to one based on the parameter or context provided during the run. Our flow are very compute intensive and run overnight and we want to avoid multiple flow or task to run at the same time if they have same parameter or context as input.

Jason Motley

12/21/2021, 7:12 PM

Is this the correct way to specify the use of multiple additional packages?

Copy code

flow.run_config = ECSRun(
env={"EXTRA_PIP_PACKAGES": "requests" "numpy"})

Jason Motley

12/21/2021, 10:12 PM

Quick question that I may have asked before, if I need to extract a series of rows/columns from a data warehouse and then "push" them into an SFTP server, is there a good way in Prefect to do that?

Daniel Komisar

12/21/2021, 10:13 PM

Hello everyone, I’m trying to see if it’s possible to query for flows runs with a specific parameter value. I’ve been able to query for runs where the parameter has a certain key using

_has_key

. I’ve tried using

_contains

with no luck, although I’m not sure if that’s the right one either, or if this is even possible. Thanks!

Danny Vilela

12/21/2021, 11:10 PM

Hi all! A co-worker is trying to schedule a Flow to run on the 2nd of every month, at 8:00 AM PT. I pointed him to the

IntervalSchedule

with

pendulum

(since that’s what I’ve used for daily/weekly tasks) but he noticed that the results don’t quite line up with what he was expecting:

Copy code

import pendulum
from prefect.schedules.schedules import IntervalSchedule
from prefect.schedules.clocks import CronClock

# Set our start date.
next_start_date: pendulum.DateTime = (
    pendulum.now(tz="America/Los_Angeles")
    .start_of(unit="month")
    .set(day=2, hour=8, minute=0, second=0)
)

# Set our monthly interval.
monthly: pendulum.Duration = pendulum.duration(months=1)

# Inspect the next few clock emissions.
schedule: IntervalSchedule = IntervalSchedule(start_date=next_start_date, interval=monthly)
print(schedule.next(n=3))
# [
#   DateTime(2022, 1, 1, 8, 0, 0, tzinfo=Timezone('America/Los_Angeles')), 
#   DateTime(2022, 1, 31, 8, 0, 0, tzinfo=Timezone('America/Los_Angeles')), 
#   DateTime(2022, 3, 2, 8, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
# ]

Why does the

IntervalSchedule

not fire on

2022-01-02

2022-02-02

2022-03-02

, etc? It appears to just be incrementing by 30 days, but that’s not quite what I’d expect. Is this a

pendulum

thing? (Edit: it’s maybe worth noting that in the example above, just doing

next_start_date + monthly

does give you the correct

DateTime(2022, 1, 2, 8, 0, 0, tzinfo=Timezone('America/Los_Angeles'))

. So I think it may actually be a Prefect thing?)

Brian S

12/22/2021, 1:41 AM

Hi All, hope all is well. I'm here for another (likely noob) question. I've been working with a Prefect parameter that contained a json object. This parameter used to pass a dict but all of a sudden it's coming in as a tuple. Is there a reason this would happen? JSON seems to be valid.

Ryan Sattler

12/22/2021, 5:32 AM

Is there a way to increase the polling frequency of Agents? The (apparent) default of 10s feels a little slow to pick up jobs at times.

rilshok

12/22/2021, 8:27 AM

thor Hello everyone, I'm taking my first steps with P. Everything will work out!

👋 2

Alfredo Prada Giorgi

12/22/2021, 8:56 AM

👋

👋 2

rilshok

12/22/2021, 8:56 AM

To get to know the community better, I'll use a small example to tell you what kind of jungle a newbie can get into. The prefect has a native way of filtering result items, but it only works at the level of one list. piggy Like a fool, I figured I just needed a super flexible filter to keep in sync as many argument lists as I want. So, let's say we have a goal: to process the list of arguments of one task, depending on how the other task handled them.

Copy code

@task
def get_paths() -> List[Path]:
    ...

@task
def dosmth(path: Path) -> Any:
    ...

@task
def finally(path: Path, smth: Any):
    ...

with Flow('fucking-prefect') as flow:
    paths = get_paths()
    smth = dosmth.map(paths) # maybe there are exceptions
    # TODO: need to synchronize paths and smth
    finally.map(paths, smth)

dusty stickI implemented my filter like

prefect.tasks.control_flow.FilterTask

Copy code

from typing import List, Any, Tuple, Union

from prefect import Task
from prefect.triggers import all_finished


class CrossSkip(Task):
    def __init__(self, *skip, **kwargs) -> None:
        kwargs.setdefault("skip_on_upstream_skip", False)
        kwargs.setdefault("trigger", all_finished)
        self._types = tuple([s for s in skip if isinstance(s, type)])
        self._values = [s for s in skip if not isinstance(s, type)]
        if not skip:
            self._types = (type(None), )
        super().__init__(**kwargs)

    def _filter(self, value) -> bool:
        return not isinstance(value, self._types) and not any([value == v for v in self._values])

    def run(self, *task_results: List[Any]) -> Union[List[Any], Tuple[List[Any], ...]]:
        """Task run method."""
        assert task_results
        assert len({*map(len, task_results)}) == 1
        if len(task_results) == 1:
            return [r for r in task_results[0] if self._filter(r)]
        return tuple([*map(list, zip(*[
            r for r in zip(*task_results)
            if all([self._filter(v) for v in r])
        ]))])

The flow should have turned into something like this, and everything would have worked fine

Copy code

with Flow('best-prefect-flow') as flow:
    paths = get_paths()
    smth = dosmth.map(paths) # maybe there are exceptions
    # >>>
    paths, smth = CrossSkip(Exception, None)(paths, smth)
    # <<<
    finally.map(paths, smth)

BUT

ValueError: Tasks with variable positional arguments (*args) are not supported, because all Prefect arguments are stored as keywords. As a workaround, consider modifying the run() method to accept **kwargs and feeding the values to *args.

In general, I know how to use python magic to solve this problem, but I refuse to conjure further 🙂 TLDR: Prefect's tasks can't unpack arguments:

Copy code

@task
def todosmth(*arg) -> Any:
    ...

Martim Lobao

12/22/2021, 10:23 AM

not to rant, but Prefect’s web app is by far the most frustrating part about using prefect, and is one of the most frustrating tools i’ve ever worked with. trying to restart a flow and the restart pop-up just hangs indefinitely. no error anywhere, including in the console. I’ve tried going through incognito and using a different browser but i get the same thing. a basic REST API that just worked reliably would be such a better alternative to a GUI that doesn’t work half the time.

Paul Gierz

12/22/2021, 10:59 AM

I hope this is the right channel to ask for general help. I was curious why this happens:

Copy code

lat_size = Parameter("Latitude Size (e.g 1 for a 1x1 degree grid)", default=1.0)
lon_size = Parameter("Longitude Size (e.g 1 for a 1x1 degree grid)", default=1.0)
lats = np.arange(-90, 90, lat_size)
lons = np.arange(-180, 180, lon_size)

but then:

Copy code

$ prefect register --project tutorial -p simulation_workflows/workflows
Collecting flows...
osgeo is not installed, conversion to Geo formats like Geotiff (fesom2GeoFormat) will not work.
Error loading 'simulation_workflows/workflows/fesom_2d_variable.py':
  Traceback (most recent call last):
    File "/Users/pgierz/.local/opt/miniconda3/envs/scicomp_esm_sim_prefect_workflows/lib/python3.9/site-packages/prefect/cli/build_register.py", line 134, in load_flows_from_script
    namespace = runpy.run_path(abs_path, run_name="<flow>")
    File "/Users/pgierz/.local/opt/miniconda3/envs/scicomp_esm_sim_prefect_workflows/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
    File "/Users/pgierz/.local/opt/miniconda3/envs/scicomp_esm_sim_prefect_workflows/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
    File "/Users/pgierz/.local/opt/miniconda3/envs/scicomp_esm_sim_prefect_workflows/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "/Users/pgierz/Documents/SciComp/Projects/Workflows/simulation_workflows/simulation_workflows/workflows/fesom_2d_variable.py", line 27, in <module>
    lons = np.arange(-180, 180, lon_size.value)
  AttributeError: 'Parameter' object has no attribute 'value'

Is the stupid solution just to make a mini task instead of directly using numpy?

Tom Klein

12/22/2021, 11:51 AM

Hey again - yesterday i presented the results of my Prefect PoC to my team and my team lead said they think we should wrap all DS code with docker containers and use those as "blackbox" steps instead of directly invoking python code from the flow -- am i right in my understand that if we do that we lose some of the advantages of Prefect like being able to easily map output of docker runs to input of the next tasks - or caching/persistence of results etc. and we'll need to do all these things manually ourselves?

Eduardo Fernández León

12/22/2021, 12:09 PM

Hi all! At my company, we are using the Prefect platform integrated with Google Cloud Kubernetes and now we are testing the Prefect Cloud using Kubernetes as agents/workers. Is there any doc/tutorial to do that? I found this article and they talk about a

runner token

that I couldn't be able to find in the Cloud UI. Thanks in advance.

Robert Kowalski

12/22/2021, 2:46 PM

Hi, I have a problem with flow, sometime flow is correctly executed without any errors in logs, but the same flow execution on next day/time never ends. I stop this flow after eg. 2 days of execution time ( correct execution time ~ 3h )and if I rerun the same flow ones again every tasks in flow is finished correctly. I use docker agent and gitlab registry. In agent logs i found this error:

Copy code

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/engine/cloud/flow_runner.py", line 188, in interrupt_if_cancelling
    flow_run_info = self.client.get_flow_run_info(flow_run_id)
  File "/usr/local/lib/python3.9/site-packages/prefect/client/client.py", line 1564, in get_flow_run_info
    raise ClientError('Flow run ID not found: "{}"'.format(flow_run_id))
prefect.exceptions.ClientError: Flow run ID not found: "0695cb92-7995-43b1-abf7-6500eb7e9fc0"

Flow freeze on one task, this task insert data to influxdb. I have two instance of this task with two different database config. This two tasks are execute in the same time. One off instance execute correctly, second task never ends. Does anybody have an idea what might be causing this log error or why the task is not ending?

Philip MacMenamin

12/22/2021, 3:19 PM

Hi, is there a standard way to test a Task raises an Exception?

Dimosthenis Schizas

12/22/2021, 3:31 PM

Hello dear community. I'm trying to deploy a k8s agent inside a k8s cluster and I'm sstruggling with the manifest. First the manifest asks for

PREFECT__CLOUD__AGENT__AUTH_TOKEN

which is deprecated if I understand correctly. Is it correct to assume that I can remove that arg and keep only the

PREFECT__CLOUD__API_KEY

Paul Gierz

12/22/2021, 4:34 PM

Hello, small and likely stupid question. I have something like this:

Copy code

@task
  def get_n_newest_files_for_pattern(pattern: str, path: str, n: int) -> list:
      """
      Task to get the n newest files for a given pattern.
      """
      logger = prefect.context.get("logger")
      <http://logger.info|logger.info>(f"Getting {n} newest files in {path} for pattern {pattern}")
      path_files = os.listdir(path)
      files_with_path = [os.path.join(path, f) for f in path_files]
      files = [pathlib.Path(f) for f in files_with_path if re.search(pattern, f)]
      <http://logger.info|logger.info>(f"Found {len(files)} files for pattern {pattern}")
      logger.debug(f"Files: {files}")
      <http://logger.info|logger.info>("Sorting files by modification time")
      files.sort(key=lambda x: x.stat().st_mtime, reverse=True)
      <http://logger.info|logger.info>(f"Returning the {n} newest files")
      return files[:n]

I am getting file not found errors with:

Copy code

FileNotFoundError: [Errno 2] No such file or directory: '<Parameter: Path to the top level of the experiment tree>/outdata/fesom'

I thought that once it was loaded, any

Parameter

would behave as whatever type it is supposed to be? It is defined like this:

Copy code

path = Parameter(name="Path to the top level of the experiment tree")

I one time before do an f-string conversion:

Copy code

outdata_path = f"{path}/outdata/fesom"

Not having f-strings would be possible, but a bit annoying

Pedro Machado

12/22/2021, 7:46 PM

Hi I am having trouble registering a flow that used to work and can't figure out what is going on. I am using a different laptop today. Can you tell what may be going on from the output in the thread?

Leon Kozlowski

12/22/2021, 7:56 PM

If I want to upgrade my agent prefect version to use some new features for a new flow (by bumping docker image tag version) will flows already deployed running with an older version of prefect continue to run? Storage: Docker Agent: Kubernetes (EKS)

Alejandro Sanchez Losa

12/22/2021, 8:51 PM

ey !!! what do you think about that? it’s a crazy idea? imposible? … I am a bit lost

Lucas Hosoya

12/22/2021, 9:27 PM

Hi, is there a way to get a Databricks job/run result in Prefect? My goal is to run a Databricks task and then get the result and put as parameter to the next task (dbutils.notebook.exit). Wondering if thats possible

Danny Vilela

12/22/2021, 9:34 PM

Hi! I have a

@task

-decorated function with

max_retries=8, retry_delay=dt.timedelta(minutes=15)

. However, I know that for certain kinds of errors, I’d actually want it to wait 30 minutes (or even an hour). Is there a way to implement this? I know I can probably just check for that exception then

time.sleep

for the extra time (for example, to wait for an hour I’d catch the exception,

time.sleep

for 45 minutes, then raise the error so that

retry_delay

kicks in), but I’m wondering if there’s a cleaner way to approach this. The above feels like a code smell but I’m not sure how I’d otherwise set context-specific retry delays on a task 🤔

Leon Kozlowski

12/22/2021, 9:44 PM

Will a flow restart via the UI use the same

prefect.context

information at time of failure? For example

scheduled_start_time

Alejandro Sanchez Losa

12/22/2021, 10:59 PM

Hi network, I know and I am sure I am the heaviest prefect user … and I apologized

Alejandro Sanchez Losa

12/22/2021, 11:06 PM

I am trying to disengage tasks from the principal flow python file using ‘custom modules’ … following some public blog and code… I build a custom module to do that, when that flow are execute inside the github actions container as part of the ci/cd, works well, but when I try to start the flow from (example) prefect cloud … maybe because the flow use github store, when the agent try to run this flow just only get the file are reference by the store plugin … and avoid the module … so then fail … the questions is what other way will let me using github actions to register flows with custom modules and this will running well after ?

Ryan Sattler

12/23/2021, 5:24 AM

I’m using Prefect Cloud with Kubernetes and have an issue with a zombie job pod stuck in a “Pending” state (due to trying to allocate more resources than possible due to a configuration error). I can force delete it (kubectl delete <pod-name> --force) but it comes back immediately (age of the pod resets to 0s so I think it was deleted). There are no Prefect flows or k8s deployments associated with the pod and the link to the flow run in its k8s description gives a 404. How do I get rid of it?