Is it possible to extend the prefect Flow with my own flow c Prefect Community #prefect-server

Is it possible to extend the prefect Flow with my ...

Greg Desmarais

07/28/2020, 1:50 PM

Is it possible to extend the prefect Flow with my own flow class that implements some common patterns? If I do that, and I use my own flow class in the context manager during a flow creation on the client, which downstream services need that flow class code? I am seeing this in my prefect server ui:

Copy code

Last State Message
[9:35am]: Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'rightsize'")

Greg Desmarais

07/28/2020, 1:50 PM

Greg Desmarais

07/28/2020, 1:50 PM

my flow class is called:

class RightsizeFlow(Flow):

Greg Desmarais

07/28/2020, 1:52 PM

Does my container running the prefect server need to have my Rightsize code installed in some way?

Jim Crist-Harif

07/28/2020, 1:52 PM

I recommend not subclassing if you can avoid it (prefect isn't designed for this use case). That said, the flow code should only ever be loaded in the flow runner, prefect server or the agent shouldn't ever load your flow.

Jim Crist-Harif

07/28/2020, 1:53 PM

The message you're seeing in the UI is a log message being sent from the flow runner to server to display in the UI, it doesn't indicate that the UI/server is loading the flow

Greg Desmarais

07/28/2020, 2:31 PM

I am using a DaskExecutor - does that mean my flow runner is the dask scheduler?

Greg Desmarais

07/28/2020, 2:32 PM

I was hoping that the prefect server and agent didn't have to actually know anything about the flow - just an opaque blob to pass along.

Greg Desmarais

07/28/2020, 2:33 PM

I am using a DaskExecutor - does that mean my flow runner is the dask scheduler?

Or is it the Dask worker?

Jim Crist-Harif

07/28/2020, 2:37 PM

I was hoping that the prefect server and agent didn't have to actually

know anything about the flow - just an opaque blob to pass along.

That's how things work currently.

Or is it the Dask worker

When an agent receives a flow run (which doesn't actually load the flow), it kicks off a new "job" to run the flow. Depending on the agent and the environment this might be a local process, a k8s job, a fargate task, etc... In that job a

FlowRunner

object is created. If you're using a

DaskExecutor

, the flow runner will then create (or connect to) a dask cluster. The

FlowRunner

process needs access to the

Flow

object - if you're using dask the workers shouldn't ever see the

Flow

object directly, but they will need access to any imports used by your tasks. The task functions themselves will be automatically serialized via cloudpickle, but any code that lives in a separate file than the flow will need to be available on the workers as well.

Greg Desmarais

07/28/2020, 2:43 PM

Ok - starting to make sense. I am using a

LocalEnvironment

with my DaskExecutor in the flow, which, from your description, sounds like the agent needs access to all the modules used in the flow definition.

Greg Desmarais

07/28/2020, 2:44 PM

I'm using a vanilla agent right now, given that the agent defers to the dask scheduler for distributing the work.

Jim Crist-Harif

07/28/2020, 2:47 PM

Yeah, with a local agent and environment, the agent python environment will need access to your code. Note that the agent itself will never import that code, so you can change/add/delete stuff without affecting the agent. But the agent will kick off a job using the same python environment it's running in currently.

Greg Desmarais

07/28/2020, 2:48 PM

If I put the agent in a container with all the available code, start it myself through python, it seems like I should be ok.

Greg Desmarais

07/28/2020, 2:49 PM

sorry - typed before read.

Jim Crist-Harif

07/28/2020, 2:51 PM

Yeah, that should work fine.

Greg Desmarais

07/28/2020, 2:53 PM

I'm trying to move everything to individual docker containers - but all running off a common image, so they all share the same environment.

Jim Crist-Harif

07/28/2020, 2:53 PM

You might try the docker agent then, it will kick off a docker container per flow run, rather than a local process.

Jim Crist-Harif

07/28/2020, 2:54 PM

https://docs.prefect.io/orchestration/agents/docker.html

Greg Desmarais

07/28/2020, 3:14 PM

ooohh...maybe...

Greg Desmarais

07/28/2020, 4:34 PM

ok, I went down the path of starting the agent in a container with access to my custom code (not the docker agent). I got past the import issue, but hit this in the agent:

Copy code

[2020-07-28 16:27:24,262] ERROR - agent | Error while deploying flow: NotImplementedError()

Greg Desmarais

07/28/2020, 4:35 PM

I start my agent with this:

Copy code

docker run \
	--env PREFECT__CLOUD__GRAPHQL="http://${EC2_IP}:4200/graphql" \
	--env PREFECT__CLOUD__API="http://${EC2_IP}:4200" \
	--env PREFECT__BACKEND=server \
	--rm \
	--entrypoint python \
	--name ${CONTAINER_NAME} \
	<http://386834949250.dkr.ecr.us-east-1.amazonaws.com/${IMAGE_NAME}:gdesmarais|386834949250.dkr.ecr.us-east-1.amazonaws.com/${IMAGE_NAME}:gdesmarais> \
	-c 'print("hi");from prefect.agent.agent import Agent;labels = ["s3-flow-storage"];agent = Agent(labels=labels);agent.start()'

Greg Desmarais

07/28/2020, 4:43 PM

ok, switched to LocalAgent instead of Agent...looks good!

Greg Desmarais

07/28/2020, 4:43 PM

eror seemed like an abstract class error.

Jim Crist-Harif

07/28/2020, 4:45 PM

Wait, why are you starting an agent that way? Any reason not to use the prefect cli for accomplishing this? The agent classes themselves aren't really user-facing.

Jim Crist-Harif

07/28/2020, 4:46 PM

If you want to run a local agent in a docker container, you should be able to do

docker run ... -c "prefect agent start"

. Alternatively, you could use

prefect agent docker start

to start the docker agent, which will run each flow run in a separate docker container.

Jim Crist-Harif

07/28/2020, 4:46 PM

The local agent starts flow runs in local processes.

Greg Desmarais

07/28/2020, 5:05 PM

I went to using the docker run command to get the agent access to the right modules. When I looked at the

prefect agent docker start

command params, I didn't see a way to tell the agent what base image to use. I thought I would be in the same position about missing modules.

Greg Desmarais

07/28/2020, 5:06 PM

I get your point about just using the

prefect agent start

as the docker command - I updated to this:

Copy code

docker run \
        --env PREFECT__CLOUD__GRAPHQL="http://${EC2_IP}:4200/graphql" \
        --env PREFECT__CLOUD__API="http://${EC2_IP}:4200" \
        --env PREFECT__BACKEND=server \
        --rm \
        --entrypoint prefect \
        --name ${CONTAINER_NAME} \
        <http://386834949250.dkr.ecr.us-east-1.amazonaws.com/${IMAGE_NAME}:gdesmarais|386834949250.dkr.ecr.us-east-1.amazonaws.com/${IMAGE_NAME}:gdesmarais> \
        agent start

Jim Crist-Harif

07/28/2020, 5:07 PM

Since the docker agent starts flow runs in separate containers, it doesn't need to have access to any of your code, the default image should work fine.

Jim Crist-Harif

07/28/2020, 5:07 PM

Only your flow runs need access to your custom image

Greg Desmarais

07/28/2020, 5:07 PM

ok - let me give that a shot - see what happens.

Jim Crist-Harif

07/28/2020, 5:09 PM

Note that if you want your flow runs to use a custom image (rather than the default prefect image) you need to either use

Docker

storage for the flow, or set the image name via

flow.environment.metadata["image"] == ...

Greg Desmarais

07/28/2020, 5:10 PM

You mean the actual running of the task code?

Greg Desmarais

07/28/2020, 5:11 PM

If so, I have those running in dask workers - which are started up from the same self build image...

Jim Crist-Harif

07/28/2020, 5:13 PM

When the docker agent gets a request to start a flow run, it will look for an image to use for that flow run. The process is: • If the flow is configured with

Docker

storage, use that image • If the flow's environment has a

image

key in the metadata field, use that image (

flow.environment.metadata['image']

) • Otherwise it will use the default

prefect:all_extras

image

Jim Crist-Harif

07/28/2020, 5:14 PM

Since your flow requires custom code, you'll want that to be available in the flow runner image, so you'll either want to use

Docker

storage or specify the image via environment metadata.

Greg Desmarais

07/28/2020, 5:15 PM

I am confused.

Jim Crist-Harif

07/28/2020, 5:15 PM

I admit this isn't straightforward, and the docs could definitely be clearer - we're hoping to streamline this process and revamp the docs in the next month or so.

Greg Desmarais

07/28/2020, 5:15 PM

How does the DaskExecutor come into play here?

Jim Crist-Harif

07/28/2020, 5:15 PM

Do you have a long-running dask cluster it's connecting to, or are you having it start a cluster?

Greg Desmarais

07/28/2020, 5:16 PM

log running

Greg Desmarais

07/28/2020, 5:16 PM

long

Greg Desmarais

07/28/2020, 5:16 PM

and on a big ecs cluster

Greg Desmarais

07/28/2020, 5:16 PM

I can't do anything local to the agent machine.

Greg Desmarais

07/28/2020, 5:16 PM

I already have the flow set up with a DaskExecutor pointing at my dask cluster.

Greg Desmarais

07/28/2020, 5:16 PM

dask scheduler address for the dask cluster, that is.

Greg Desmarais

07/28/2020, 5:17 PM

Which is why the docker agent seems so weird

Jim Crist-Harif

07/28/2020, 5:17 PM

So in that case: • Agent gets a request to start a flow run • Agent finds image associated with that flow • Agent starts a new container with that image for executing the flow run • Flow runner starts in that container • Flow runner creates a

DaskExecutor

connected to your dask cluster • Flow runner starts running the flow on that executor • Flow run completes, flow runner shuts down • Container shuts down

Greg Desmarais

07/28/2020, 5:17 PM

Shouldn't the agent just pass along my tasks to the dask scheduler?

Greg Desmarais

07/28/2020, 5:18 PM

typed too slow.

Jim Crist-Harif

07/28/2020, 5:18 PM

Prefect is designed so that agents never actually touch user code. The agent could start a k8s job that runs your code, but the thing talking to our servers never gets access to it.

Greg Desmarais

07/28/2020, 5:18 PM

how s running in the container with a docker agent better than a local agent in that case?

Greg Desmarais

07/28/2020, 5:18 PM

I'm hosting my own servers - not sure if that was clear...

Jim Crist-Harif

07/28/2020, 5:19 PM

For your use case, it doesn't seem like it's an improvement. If I were you I'd just run the local agent, since all the work is happening remotely. For other users it might make more sense.

Greg Desmarais

07/28/2020, 5:19 PM

Sure..

Greg Desmarais

07/28/2020, 5:19 PM

Ok - so I got a flow running with a local agent by registering it from the client, then using the prefect Client to create a flow run...

Greg Desmarais

07/28/2020, 5:20 PM

Is there a lightweight way to trigger a flow run from a client without registering a flow?

Greg Desmarais

07/28/2020, 5:20 PM

And as a follow on, is there a way to get the return values from a flow run?

Jim Crist-Harif

07/28/2020, 5:21 PM

If you're using cloud/server we require an explicit registration step. You can run without cloud/server using

flow.run(executor=executor)

though.

Greg Desmarais

07/28/2020, 5:22 PM

hmm...ok - I hadn't tried that one. Then the executor, which is configured to point to my dask scheduler, would get the flow tasks...

Greg Desmarais

07/28/2020, 5:22 PM

And I do have access to the dask cluster from the client...

Greg Desmarais

07/28/2020, 5:23 PM

how about this one, since I have your attention

Greg Desmarais

07/28/2020, 5:23 PM

And as a follow on, is there a way to get the return values from a flow run?

Jim Crist-Harif

07/28/2020, 5:24 PM

If you're using

flow.run(...)

, the output of that will return the flow's state, with the results of individual tasks attached.

Greg Desmarais

07/28/2020, 5:24 PM

ok, cool...how about a registered flow?

Jim Crist-Harif

07/28/2020, 5:25 PM

When running with orchestration, you can query using the client to get the locations where any cached results are stored, but since the actual flow execution happens remotely there's not really an obvious place to return them to you directly. Many things can trigger a flow run, including UI activity.

Jim Crist-Harif

07/28/2020, 5:25 PM

cc @josh for further tips on the above ^^

Greg Desmarais

07/28/2020, 5:26 PM

Ok.- I haven't even started down the path of results and the like.

josh

07/28/2020, 5:27 PM

Yep @Jim Crist-Harif that is spot on! Accessing the actual place where the Result type is writing to is how you would get the return values when running with a backend API. Also you could optionally make a task in your flow that is responsible for putting the values somewhere else if you don’t want to use results (aka a load type task in an ETL workflow)

Greg Desmarais

07/28/2020, 5:28 PM

I can dig into the docs, but any pointers on storing and accessing results for a registered flow?

josh

07/28/2020, 5:29 PM

No pointers exactly but here are some relevant links that would get you on the right path 🙂 https://docs.prefect.io/core/concepts/results.html#result-objects https://docs.prefect.io/core/idioms/targets.html

Greg Desmarais

07/28/2020, 5:30 PM

ok, thx...I'm on my way again. At least far enough to start a new thread with the next question!

Greg Desmarais

07/28/2020, 5:30 PM

oh - wait...

Greg Desmarais

07/28/2020, 5:31 PM

well...ok, different thread...

Greg Desmarais

07/28/2020, 5:32 PM

What I'm now figuring out is how to do all the results stuff with a registered flow that is run with a Client.create_flow_run

Jim Crist-Harif

07/28/2020, 5:41 PM

Configuring results is the same for local and orchestrated flow runs, see https://docs.prefect.io/core/idioms/targets.html, https://docs.prefect.io/core/idioms/targets.html, and https://docs.prefect.io/core/advanced_tutorials/using-results.html for more info.

9 Views

Open in Slack

Previous Next