Hi I am trying to set up a flow of flows with parameters bei Prefect Community #ask-community

Hi, I am trying to set up a flow of flows with par...

Thomas Furmston

10/18/2021, 9:49 AM

Hi, I am trying to set up a flow of flows with parameters being passed from the parent flow to the child flow, but am having some issues. With hard-coded values for the parameters it works, so it seems to be related to how I am trying to set up the parameter passing.

Anna Geller

10/18/2021, 10:41 AM

Hi @Thomas Furmston, I will try to reproduce the issue. Overall, I think it’s best to set default values for parameters and override those dynamically when needed. In your flow, you specify required parameters without setting default values for them. This way, if you run this flow independently e.g. on schedule, it will fail. It can only run successfully when triggered from UI, API or from another flow when passing parameters.

Copy code

num_days_parameter = Parameter('num_days', required=True)
num_back_fill_days_parameter = Parameter('num_back_fill_days', required=True)
end_date_parameter = Parameter('end_date', required=True)

Do you have some sensible defaults for those parameters?

Thomas Furmston

10/18/2021, 10:59 AM

I could come up with some sensible defaults for

num_days_parameter

and

num_back_fill_days_parameter

, but I am not so sure about

end_date_parameter

Thomas Furmston

10/18/2021, 10:59 AM

I am running it through a schedule and this is the point at which is fails

Thomas Furmston

10/18/2021, 11:00 AM

You think if I provide some defaults it will work from a schedule?

Anna Geller

10/18/2021, 11:11 AM

yes, exactly @Thomas Furmston. This flow cannot run on schedule, because it doesn’t know what to use as parameters, since it’s a required parameter with no default value. You can attach default parameters to your schedule, though. You can do this either via code:

Copy code

import pendulum
from prefect.schedules import Schedule
from prefect.schedules.clocks import CronClock


clock_1 = CronClock(
    '41 10 * * 1-5',
    start_date=pendulum.now(tz='Europe/London'),
    parameter_defaults={
        'num_days': 1,
        'num_back_fill_days': 1,
        'end_date': '',
    },
)

schedule = Schedule(clocks=[clock_1])

or from UI:

Anna Geller

10/18/2021, 11:16 AM

Then, in your task, you can implement custom business logic based on this parameter value, e.g.

Copy code

if end_date_parameter:
        end_date = end_date_parameter
    else:
        end_date = pendulum.now(tz='Europe/London').to_date_string()

Thomas Furmston

10/18/2021, 11:56 AM

So this would be for the child task?

Thomas Furmston

10/18/2021, 11:58 AM

I have defaults for the parent tasks, which is the one running on a schedule

Thomas Furmston

10/18/2021, 11:58 AM

Copy code

num_days_parameter = Parameter('num_days', default=1)
num_back_fill_days_parameter = Parameter('num_back_fill_days', default=1)
end_date_parameter = Parameter('end_date', default=None)

Thomas Furmston

10/18/2021, 11:58 AM

I then set the date from the schedule, if appropriate in the following:

Thomas Furmston

10/18/2021, 11:59 AM

Copy code

@task
def calculate_flow_end_date(end_date: str):
    if end_date is not None:
        return end_date
    return prefect.context.get('scheduled_start_time').to_date_string()

Thomas Furmston

10/18/2021, 11:59 AM

The child task I am not planning to run on a schedule, but pass the parameters in from the parent task.

Thomas Furmston

10/18/2021, 12:00 PM

However, when I pass in the arguments in it still fails.

Thomas Furmston

10/18/2021, 12:00 PM

Copy code

common_flow_result = common_flow(parameters={
    'num_days': num_days_parameter,
    'num_back_fill_days': num_back_fill_days_parameter,
    'end_date': task_end_date,
})

Thomas Furmston

10/18/2021, 12:01 PM

Sorry, I think my initial response was a bit misleading

Thomas Furmston

10/18/2021, 12:04 PM

What is confusing me is that the child task doesn't seem to be picking up the parameters from the parent task

Anna Geller

10/18/2021, 12:25 PM

Got it, will look into this and get back to you

Anna Geller

10/18/2021, 1:29 PM

@Thomas Furmston I was trying to reproduce the error and somehow it worked for me. Here is a child flow: https://gist.github.com/fd8d850f643b59b87c063581926d26db And a parent flow: https://gist.github.com/eda4b42517793db583a1c0c8b625d6e5 Can you confirm that you registered your child flow before using it in the parent flow? If this is not the case, then it could be something with the ShellTask in the child flow.

Thomas Furmston

10/18/2021, 1:33 PM

Thomas Furmston

10/18/2021, 1:33 PM

Yes, I have my makefile set up to ensure the child is registered first.

Thomas Furmston

10/18/2021, 1:34 PM

I don't think it even gets to the shell task in the child flow.

Thomas Furmston

10/18/2021, 1:36 PM

That was my impression anyway. It is quite hard to read from the logs

Thomas Furmston

10/18/2021, 1:36 PM

Is there a way to work out which line of a flow throws the error?

Thomas Furmston

10/18/2021, 1:37 PM

I'll try removing the shelltask and see that removes the error

Anna Geller

10/18/2021, 1:39 PM

This is the reason why: Tasks were created but not added to the flow: {<Parameter: script>}. This can occur when

Task

classes, including

Parameters

, are instantiated inside a

with flow:

block but not added to the flow either explicitly or as the input to another task. For more information, see https://docs.prefect.io/core/advanced_tutorials/task-guide.html#adding-tasks-to-flows.

Anna Geller

10/18/2021, 1:44 PM

Since you want to log parameter values anyway, I’d recommend doing it as part of a separate task. This way, you are passing the parameter value to a task as data dependency, thus implicitly adding the Parameter task to the flow. From then on, you can pass this parameter value to other tasks such as the shell task:

Copy code

from prefect import Flow, Parameter, task
from prefect.tasks.shell import ShellTask


@task(log_stdout=True)
def log_param_value(user_input: str):
    print(user_input)


shell = ShellTask(return_all=True)

with Flow("shell-flow") as flow:
    param = Parameter("user_input", default="your_value")
    log_param_value(param)
    shell_task = shell(command=f"echo {param}")

if __name__ == '__main__':
    # flow.run()
    flow.run(parameters=dict(user_input="Hello World"))

Anna Geller

10/18/2021, 1:53 PM

@Thomas Furmston additionally, you need to initialize the

ShellTask

before you call it within the Flow constructor, as in the code snippet above shows. Overall, you were right when you said that your child flow previously didn’t even get to the ShellTask - because the Parameter was instantiated in the flow, but was not yet called. Now that we add this Parameter to the Flow through the

log_param_value

task (as data dependency), it now can be used in downstream tasks (your shell tasks and

StartFlowRun

). The same happened with the

ShellTask

- it was instantiated within the Flow, but was not called/explicitly added to the flow via data dependency nor upstream/downstream dependencies. Now that we instantiate it beforehand and call it within the Flow constructor, it works as expected. Does it make sense for you?

Thomas Furmston

10/18/2021, 1:55 PM

I see. Let me try that out then.

Thomas Furmston

10/18/2021, 1:56 PM

I was actually setting the dependencies of the shell tasks explicitly in my actual example, but I forgot to copy it into my example that I posted in slack

Thomas Furmston

10/18/2021, 1:57 PM

I was not do the same with the parameters though.

Thomas Furmston

10/18/2021, 1:57 PM

Let me make the changes and see if it works

Thomas Furmston

10/18/2021, 1:57 PM

Thanks for helping!

👍 1

Thomas Furmston

10/18/2021, 3:41 PM

um......I still seem to be getting the same error 😐

Thomas Furmston

10/18/2021, 3:42 PM

Copy code

@task(log_stdout=True)
def log_parameter_value(parameter_name: str, parameter_value: (str, int, float)):
    <http://logger.info|logger.info>('Parameter: (%s, %s)', parameter_name, parameter_value)


num_days_parameter = Parameter('num_days', required=True)
num_back_fill_days_parameter = Parameter('num_back_fill_days', required=True)
end_date_parameter = Parameter('end_date', required=True)

shell_command = ShellTask(
    name='served_advert_task',
    stream_output=True,
)

with Flow(
    'my_flow',
    storage=Docker(
        base_image='my-docker-image:latest',
        local_image=True,
    )) as flow:

    log_parameter_value('num_days', num_days_parameter)
    log_parameter_value('num_back_fill_days', num_back_fill_days_parameter)
    log_parameter_value('end_date', end_date_parameter)

    shell_command(
        command=construct_etl_command(),
    )

Thomas Furmston

10/18/2021, 3:43 PM

This is the new code, but I still get the issue

Thomas Furmston

10/18/2021, 3:43 PM

/tmp/prefect-qc0rm5qd: line 1: Parameter:: No such file or directory

Thomas Furmston

10/18/2021, 3:44 PM

I'm really confused what is going on now

Kevin Kho

10/18/2021, 4:25 PM

Hey @Thomas Furmston, what RunConfiguration are you using?

Anna Geller

10/18/2021, 4:26 PM

@Thomas Furmston you should probably add DockerRun run configuration. When I run the same logic of your flow on a local agent with local storage, it works. So @Kevin Kho is right that there must be some issue in your DockerRun and Docker storage configuration

Copy code

from prefect import Flow, Parameter, task
from prefect.tasks.shell import ShellTask

@task
def construct_etl_command():
    return "ls"


@task(log_stdout=True)
def log_param_value(user_input: str):
    print(user_input)


shell = ShellTask(stream_output=True)


with Flow("shell-flow") as flow:
    param = Parameter("user_input", default="your_value")
    log_param_value(param)
    shell = shell(command=construct_etl_command())

if __name__ == '__main__':
    # flow.run()
    flow.run(parameters=dict(user_input="Hello World"))

Thomas Furmston

10/18/2021, 5:02 PM

I had a dockerrun config

Thomas Furmston

10/18/2021, 5:02 PM

removing it and copying your example works for me

Thomas Furmston

10/18/2021, 5:03 PM

i am going to go backwards and add my prevous stuff back in and see what breaks it

👍 1

Anna Geller

10/18/2021, 5:27 PM

nice work! yeah definitely take it one step at a time and let us know what is the issue once you analyzed it

Thomas Furmston

10/18/2021, 5:38 PM

So adding in the

DockerRun

doesn't break anything and the flow still runs. However, when I remove the

task

decorator on the

construct_etl_command

function then the error returns

Thomas Furmston

10/18/2021, 5:39 PM

So I would guess that putting a task (in this case a Parameter) through an undecorated function makes the link to the parameter to be lost?

Thomas Furmston

10/18/2021, 5:39 PM

complete guess 🙂

Thomas Furmston

10/18/2021, 5:40 PM

Thanks a lot for helping me debug this issue

Thomas Furmston

10/18/2021, 5:41 PM

I do have one follow up question. Previously I was setting the dependencies between two different shell tasks like so,

shell_task2.set_dependencies(upstream_tasks=[shell_task1])

Thomas Furmston

10/18/2021, 5:42 PM

Is it still possible to set dependencies like this when I have called the instance of the

ShellTask

class in the context of the

flow

Thomas Furmston

10/18/2021, 5:43 PM

i.e., after the above suggested changes

Anna Geller

10/18/2021, 5:45 PM

you absolutely can. When you call this line

Copy code

shell = shell(command=construct_etl_command())
shell_2 = shell(command=construct_etl_command())

you are creating a copy of a task and you can call it by reference e.g.

Copy code

shell.set_downstream(shell_2)

Thomas Furmston

10/18/2021, 5:47 PM

amazing, thanks!

👍 1

Thomas Furmston

10/18/2021, 5:47 PM

Let me try that out

Kevin Kho

10/18/2021, 5:52 PM

I think I know what you are saying. You have a Parameter that you want to use in

ShellTask()

, but the

Parameter

only exists in the

Flow

context manager. For example:

Copy code

mytask = MyTask(x)
with Flow(...) as flow:
     x = Paramater("x", default = 0)
     mytask()

Is that right? I think you need to make

MyTask()

configurable during runtime such that

Copy code

mytask = MyTask()
with Flow(...) as flow:
     x = Paramater("x", default = 0)
     mytask(x)

would work because the

init

method is evaluated during build time when the Parameter is empty but the

run

method is deferred. I believe the

ShellTask

can be configured during runtime so you want to push more of the parameters to the run method where the

Parameter

will have a value. You can also do

Copy code

with Flow(...) as flow:
     x = Paramater("x", default = 0)
     MyTask(...)(x)

The first

()

is the

init

and the second

()

is the run

Thomas Furmston

10/18/2021, 5:56 PM

I mean that I have something like

Copy code

with Flow(...) as flow:
     x = Paramater("x", default = 0)
     MyTask(...)(command=f(x))

in which the function

just constructs the command to be run in the shell task.

Thomas Furmston

10/18/2021, 5:56 PM

The error seems to come when

is not decorated as a task.

Thomas Furmston

10/18/2021, 5:57 PM

I am guessing that the fact that it is not decorated means that the dependency on the Parameter on the ShellTask is lost.

Thomas Furmston

10/18/2021, 5:58 PM

I currently don't know enough about Prefect to make that statement more precise. 🙂

Thomas Furmston

10/18/2021, 5:58 PM

but hopefully you get my general jist

Thomas Furmston

10/18/2021, 5:58 PM

Does it make sense?

Kevin Kho

10/18/2021, 5:59 PM

Ah ok, yes I think it needs to be a task because there is some “magic” where Task results are passed on to each other. The Parameter is just a special task. If you pass a Task to a Python function, you are passing the class. If you pass a Task to another task, it gets the result.

Kevin Kho

10/18/2021, 6:00 PM

This might be solvable with:

Copy code

with Flow(...) as flow:
     x = Paramater("x", default = 0)
     MyTask(...)(command=f(x())

because that () after

will call the run of the task. Not 100% sure it will work.

Kevin Kho

10/18/2021, 6:03 PM

This is working for me though:

Copy code

from prefect import Flow, task, Parameter, Task
import prefect

class TestTask(Task):

    def run(self, x):
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>(x)
        return "test_" + x

test = TestTask()

def mycallable(x):
    return "call_" + x

with Flow("aaa") as flow:
    x = Parameter("x", default="x")
    test(mycallable(x))

flow.run()

upvote 1

Kevin Kho

10/18/2021, 6:04 PM

Also this similarly works,

Copy code

with Flow("aaa") as flow:
    x = Parameter("x", default="x")
    TestTask()(mycallable(x))

flow.run()

So I’m not quite sure what is up yet 😅

Thomas Furmston

10/18/2021, 6:05 PM

yeah, it kind of frazzled my brain, to be honest. 😅

Thomas Furmston

10/18/2021, 6:05 PM

I'd love to understand the issue better though

Kevin Kho

10/18/2021, 6:12 PM

Could you give me a minimal example of the broken code?

Thomas Furmston

10/18/2021, 6:18 PM

sure, no problem

Thomas Furmston

10/18/2021, 6:19 PM

it will probably be tomorrow now. 🙂

👍 1

Thomas Furmston

10/19/2021, 10:14 AM

Copy code

import prefect
from prefect import (
    Flow,
    Parameter,
    task,
)
from prefect.storage import Docker
from prefect.tasks.shell import ShellTask


# @task(log_stdout=True)
def construct_etl_command(num_days: str = None) -> str:
    return 'echo {0}'.format(num_days)


@task(log_stdout=True)
def log_parameter_value(parameter_name: str, parameter_value: (str, int, float)):
    """
    Log the given parameter to the Prefect logger.
    :param: parameter_name: The name of the parameter.
    :param: parameter_value: The value of the parameter.
    """
    prefect_logger = prefect.context.get('logger')
    <http://prefect_logger.info|prefect_logger.info>('Parameter: (%s, %s)', parameter_name, parameter_value)


num_days_parameter = Parameter('num_days', required=True)

shell_task = ShellTask(name='shell_task', stream_output=True)

with Flow(
        'minimal_broken_docker_flow',
        storage=Docker(
            base_image='python:3.9',
        )) as flow:

    log_parameter_value('num_days', num_days_parameter)

    shell_command = construct_etl_command(
        num_days=num_days_parameter,
    )
    shell_task = shell_task(command=shell_command)

Thomas Furmston

10/19/2021, 10:14 AM

So here is an example that is broken, though annoyingly giving a slightly different error from yesterday.....

Thomas Furmston

10/19/2021, 10:15 AM

Uncommenting

# @task(log_stdout=True)

fixes the issue, as expected

Thomas Furmston

10/19/2021, 10:17 AM

I will try to replicate the exact error message. I am not sure why it is currently different from the one I got yesterday.

Kevin Kho

10/19/2021, 2:03 PM

Oh I think the Parameter needs to be inside the Flow since it’s a task and the Flow object is the one that connects it to other tasks

Thomas Furmston

10/19/2021, 2:59 PM

Even is

log_parameter_value

is a task? I thought putting the parameter through a task added it to the dependency graph.

Kevin Kho

10/19/2021, 3:12 PM

Ah I see what you are saying. You might be right there. Let me dig a bit.

Kevin Kho

10/19/2021, 3:21 PM

Ah ok I understand it now I think.

@task

will have deferred execution, but that function call is evaluated. It is evaluated when the flow is serialized. So by default, Prefect uses pickle-based storage, but you can opt to use script-based storage. See this for more info. When you use pickle-based storage, stuff is evaluated as the flow is registered. When you use script-based storage, the function is deferred until runtime.

Thomas Furmston

10/19/2021, 3:59 PM

I see. That makes sense.

Thomas Furmston

10/19/2021, 4:00 PM

The difference in evaluation between evaluations during serialisation and runtime is a little confusing at first.

Thomas Furmston

10/19/2021, 4:00 PM

Thanks for the help

Kevin Kho

10/19/2021, 4:01 PM

We agree and are moving away from serialization for Prefect 2.0

👍 1

Kevin Kho

10/19/2021, 4:01 PM

And of course!

Open in Slack

Previous Next