prefect-community #prefect-community

dict. The documentation is not more than a hello-world. I'd like to use Prefect's Context to pass some (configuration) constants to task but that's not possible.

Copy code

#!/usr/bin/env python
# coding: utf8

import prefect
from prefect import task, Flow
from prefect.environments.storage.local import Local


@task
def print_context():
    prefect.context.get("logger").info("get-val is '%s'", prefect.context.get("val"))
    prefect.context.get("logger").info("dot-val is '%s'", prefect.context.val)


with Flow("contexttest", storage=Local(directory="/flows/.prefect/flows")) as flow:
    with prefect.context(val="SPAM"):
        print_context()

if __name__ == "__main__":
    flow.register()

Will print

'None'

and in the second line throws an exception. Thus, the context's

val

is only valid in the

with

block. But what's the purpose of

Context

if not passing some simple constants around? I can also write

Copy code

with Flow("contexttest", storage=Local(directory="/flows/.prefect/flows")) as flow:
    prefect.context.val="SPAM"
    print_context()

with the same result: not available in task.

Luis Muniz

07/21/2020, 10:04 AM

Hi guys, we are trying to iterate a cursor of a long running query, chunk the results, and dynamically spawn a task that will handle this chunk. The examples I can find now in the documentation use a declarative approach, either by defining the task graph inside the Flow DSL, and using the @task decorator, or using an explicit Task instance, but here too inside the Flow itself. What we would like to have, inside a Task, that is handling a scrollable result set, is to be able to spawn dynamically a undefined number of tasks, because the total size of the result set is unknown. I hope I have been able to frame our use case properly. Here a bit of pseudo-code to illustrate it a little:

Copy code

@task 
def get_data(chunk_size): 
    #fetch connection
    curr = connection.cursor()
    curr.execute(sql.SQL("select id from big_table"))
    collection = curr.fetch_many(chunk_size))
    while (!collection.is_empty()):
       #spawn task(collection) <----- *Here spwan a task*
       collection = curr.fetch_many(chunk_size))

Ben Davison

07/21/2020, 12:36 PM

Question about Parameters:

Preston Marshall

07/21/2020, 1:40 PM

For fargate: It seems like there is a 2 minute stop timeout, is this accurate? So if you have a task that runs for longer than 2 minutes it is killed? How does that work?

👀 1

Sven Teresniak

07/21/2020, 3:52 PM

I'm working with Prefect Server for a week now. Its fun! I like! Prefect is my perfect connection for handling dataflows between presto, spark, postgres, s3, etc. The setup as "all Prefect components as containers in one k8s-pod" took me a day. 😞 I will soon handle about 1TB of fresh data every day for a variety of specialized services. Thanks.

💯 1

🚀 4

Florian L

07/21/2020, 3:59 PM

Hello, i just implemented the new LocalEnvironment, with a LocalDaskExecutor, but as result i've got a new problem. My context's modification are no longer shared between the different task/functions of my flows. My understanding is that they are each in their own thread and not communicating with each other. Is there a solution to that problem, aside from no longer using dask ? Ps : I'm not expecting context to be shared instantly between task executed in parralels. I'm setting context at the beginning of my flow execution, and towards the end several steps after.

Chris Goddard

07/21/2020, 5:40 PM

hi there! I'm working on deploying prefect on a server. in the short term I'm just wanting to run a flow as a script rather than run prefect server. that works fine locally but for some reason on the server I keep getting errors because it's expecting an API key to prefect cloud. is there a configuration setting I'm missing?

Pedro Machado

07/21/2020, 5:43 PM

Hi everyone. I have a question about creating deterministic flow run names. I am working on a flow that will have a schedule with up to 18 clocks. If you are wondering why that many, here is some background. This flow pulls data from a reporting API that is organized around data bundles and different granularities (daily, weekly, monthly). Not all combinations are valid but I have identified 18 valid ones so far. Since each combination is available at different times with different frequencies, we need different schedules for each combination of parameters. The flow structure/logic is the same for all the combinations so it makes sense to have a single flow. Is there a way to define flow run names based on a combination of parameters and

scheduled_start_time?

More generally it would be great to be able to use context + parameters to define the flow run name.

Matt Wong-Kemp

07/21/2020, 7:24 PM

What's the 'proper' way to do async inside a prefect task? I'm dual-testing in Jupyter, so I'd like to keep the async bit working nicely, and as far as I can tell Prefect isn't running an event loop. At the minute I've got something that looks like this:

Copy code

async def do_thing_async_impl(a,b,c):
    await ...
    ...

@task
def do_thing(a,b,c):
    return asyncio.run_until.complete(do_thing_async(a,b,c))

but I'm getting an error:

Future <Future pending cb=[BaseSelectorEventLoop._sock_connect_done(9)()]> attached to a different loop')

Before I got digging into event loop fun, is there an easier way to do an async task?

karteekaddanki

07/21/2020, 10:27 PM

Hey guys, I am trying to use result targets to cache some of my results as suggested in https://docs.prefect.io/core/idioms/targets.html. However a lot of my heavyweight tasks are run in

C++

and are invoked via Python. How can I use targets effectively in this case? I've tried to

return None

but as expected, this causes my task to run always. I've worked around this issue by creating a target a layer of indirection by returning a path to a file that contains the filename of the actual file generated by my

C++

program (similar to empty targets in make, the presence of this file is indicates that the task is run and the contents of this file point to the location of the task output). It would be nice if I can avoid this indirection and directly be able to return a

Result

object that doesn't necessarily correspond to a serialized python object. All downstream processes that consume these results treat them as locations. In other words, I am looking for a behavior identical to Luigi. Thanks in advance.

Thomas La Piana

07/22/2020, 5:29 AM

I have my graphql URL behind traefik and am using auth with it. I'd like to be able to register a flow by just passing the auth headers to the register() method the same way that I can pass it to the graphql queries. Should I make a PR for this or is there another option to register the flow? Is the serialize + graphql option I've seen the correct solution here?

Adrien Boutreau

07/22/2020, 8:56 AM

Hello! We install your Prefect Core on EC2 instance, and our customer is really happy about this product and graphic design! congrat's! Only one issue to fix is prefect agent : I started to run in background (prefet agent start &) but it seems to disappear at one moment and I don't understand : any idea on how I should run it ?

Matias Godoy

07/22/2020, 9:54 AM

Hello! I don't know if this has been reported before, but I found that the Parameters panel in the run section is broken (see attached image). Maybe it's because we're using really long parameters (a JWT token)

Iain Dillingham

07/22/2020, 10:09 AM

Hi community. I'm trying to get Prefect server (v0.12.5) working locally but GraphQL can't connect to Postgres. There are a couple of other questions in this channel that describe a similar issue, but both relate to config.toml. I haven't written config.toml and I haven't set any

PREFECT__

environment variables, so I'm using defaults.

Lance Haig

07/22/2020, 10:20 AM

Hi, I was looking at the project on github and I noticed that in this pull request https://github.com/PrefectHQ/prefect/pull/2492 The Nomad agent was removed. I am just curious why this is the case?

Marwan Sarieddine

07/22/2020, 1:28 PM

Hi folks, is it possible to change a flow’s execution environment after a flow is registered ?

Sven Teresniak

07/22/2020, 2:00 PM

Can I call a task from within another task? Is this allowed? Bad style? No problem at all? Any caveats? I suppose its problematic because this could take the A out of DAG…

Richard Hughes

07/22/2020, 2:48 PM

Hi, I was wondering how does scheduling the same task to run with different parameters at the same time work? It seems to not work as I imagined it should. Maybe it seems there is a limitation to only have one task at a specific time of day. Does anyone have any insight on this setup?

Matt Wong-Kemp

07/22/2020, 3:12 PM

Is there a way to set default context values for a flow? I'm wanting to put the endpoint to hit for API services into the context, and at the minute I'm entering this by hand every time I want to run from the UI. I guess ideally any prefect context values set when registering the flow would set them as defaults.

Michael C. Grant

07/22/2020, 3:49 PM

Hey folks, I'm experimenting with using a Dask Gateway cluster to handle workloads. Currently we're using a local Dask execution environment with success so we're good there. Does anyone happen to have a custom Dask Gateway worker with prefect preinstalled they'd be willing to share?

👋 2

👏 1

💯 3

Shawn Marhanka

07/22/2020, 9:59 PM

Hi, I’ve been playing with Prefect Core for the past few weeks and recently moved over to experimenting with Prefect Cloud. We currently have all of our prefect flows/tasks in a separate repo. If we register all of the flows in that repo and use docker storage, can other apps in our ecosystem programmatically call those flows once they have authenticated to Prefect Cloud. I found

client.create_flow_run(flow_id, parameters=parameters)

, but I cannot find how to get the flow_id without registering. Is there a way to get all of the flow mappings (name + id) from a cloud project and then use that when creating flow runs. Or am I going about this all wrong? Thanks for the help.

James Bennett Saxon

07/23/2020, 1:58 AM

I've read the intro docs and went through the tutorial and was trying out some Prefect Tasks. First off was MySqlFetch because, well MySql.... I feel like I've got things setup right but I'm running into an unexpected error in the task runner trying to do a

MySqlFetch.run()

. :

Copy code

ERROR:prefect.TaskRunner:Unexpected error: AttributeError('__enter__',)

So I didn't want to get into debugging this because my code could be totally wrong. I was hoping to find some examples of using this and other Prefect Tasks. Are there examples for how to use this and other tasks?

Sven Teresniak

07/23/2020, 7:59 AM

Hi, I'm getting familiar with Prefect but now I have Flow I need some help with to make it elegant. I have something like this:

Copy code

def complex_task_generating_function(singleelement):
  case(sometask, foo):
    anothertask(…)
    …

with Flow("foo") as flow:
  param = Parameter("param", required=False)
  
  # generates a list of strings, based on param. len is 0…n
  elements_to_process = maybe_generate_work_items(param) 
  
  # when this evaluates to False, all the following is skipped, the apply_map as well!
  case(isempty_task(elements_to_process), True):
    # now I either want to add one default element or 
    # somehow do the processing based on the following result
    generated_default = default_value_generator_task()
    
    # maybe so?
    elements_to_process = task(lambda x: [x])(generated_default)

  # now the tricky part.
  # elements_to_process is either a list or just one (runtime dependent) default value
  result = apply_map(complex_task_generating_function, elements_to_process)

Problem is:

apply_map

does not know

skip_on_upstream_skip

. I cannot just use

map()

because

complex_task_generating_function

is not a task (its the beef of the flow so to say and in fact the logic of the flow). I found a workaround by doing something like this:

Copy code

@task(name="hack", skip_on_upstream_skip=False)
def merger_hack(elements, default):
  return elements or [default]

with Flow("foo") as flow:
  param = Parameter("param", required=False)  # same as above
  elements_to_process = maybe_generate_work_items(param)  # same as above
  
  case(isempty_task(elements_to_process), True):
    generated_default = default_value_generator_task()

  final_list = merger_hack(elements_to_process, generated_default)
  result = apply_map(complex_task_generating_function, final_list)

But to write code like the hack-task that basically checks if the flow ran through the isempty-case or not seems odd. I don't want to "check" whether or not the flow used one path or another. The run path through the flow should decide this. How can I write this elegant and easy? Sorry for the long question but I want to learn how to use Prefect properly because in the future I'm going to write a lot of flows.

bruno.corucho

07/23/2020, 9:32 AM

Hey guys, I'd like to read millions of database records using Dask's read_sql_table function (which works using Dask alone) within prefect, while still partitioning my data in n partitions, computing them in parallel and merge them altogether, in the end. Do you guys have any best approach/practices for Dask-specific functionalities within Prefect? How would the procedure be after my method*:*

Copy code

df = read_sql_table(table='peanuts', uri=connection,
                                index_col="peanut_id", columns=["peanut_details, peanut_date"],
                                npartitions=1000)

Should I return these partitions and do the delaying and computing using a Prefect's map() from within the flow's scope definition? Thanks in advance! 🙂 And have a great weekend!

Sven Teresniak

07/23/2020, 11:41 AM

Hmmm maybe "I'm holding it wrong", but I need a "else" functionality for

case

. The

ifelse

-Task seems not to fit.

Thomas Hoeck

07/23/2020, 12:16 PM

Hi all! Is there a way to limit which repositories the Docker Agent is allowed to pull from? Because as I see it, if someone got access to your Prefect account they could schedule your Docker Agent to run any image of their liking. This would have some pretty big security implications as you probably have provided your Docker Agent with secrets and that it probably is ruining on your on-prem network. As I see it this gives the Prefect Team (in theory) access to running code on all on-prem networks and extract the secrets set on the Docker Agent through env-vars.

Klemen Strojan

07/23/2020, 12:18 PM

Is it possible for a scheduled flow to run on multiple agents at the same time, if all labels match? Is this expected or is it a bug? We are using Cloud. https://prefect-community.slack.com/archives/CL09KU1K7/p1595309512466100

Adam

07/23/2020, 4:50 PM

Hi everyone! My company is thinking of using Prefect but we'd like to talk through some of our use cases and see if it's a good fit. Who's the best person to talk with?

👀 1

Jason Carter

07/23/2020, 7:22 PM

Hi everyone, looking for any tips/pointers on where I'm going wrong.... I'm a couple days new to Prefect (used Airflow in the past) and I'm trying to just setup a hello world type thing. Using Prefect Core I was able to get running via CLI and also got the scheduling working. My problem comes when I'm trying to visualize and "register" a flow in the UI The UI is up and running, everything is green, I ran

prefect backend server, prefect server start and prefect agent start

in that order but no flow in the UI.

Copy code

import prefect
from prefect import task, Flow

@task
def hello_task():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hello, Cloud!")

flow = Flow("hello-flow", tasks=[hello_task])

# flow.run()
flow.register()

👀 1

Ashish Arora

07/23/2020, 11:08 PM

Hello everyone, Is there a way for you to register a flow (which gives you the UI url on the localhost) and then visualize a particular run of the flow that was executed from the python code itself using (flow_name.run() function) or it only works for manual UI runs and scheduled jobs?