Hey everyone, quick question regarding flow storag...
# prefect-server
Hey everyone, quick question regarding flow storage metadata: I'm running a docker-compose Prefect stack with an agent container and due to the
metadata that is automatically added to flows that are configured as mine are, I'm forced to register the flow from within the agent container, rather than just running it in vscode (which tacks on my own home dir as the
and of course yields an exception on flow execution). This certainly works, and is not a big deal at all, but I'm wondering if I can force the metadata somehow? Passing a directory to the Local() object of course won't allow me to use a path that only exists in the agent container, as it's for local storage, and while I could probably fake it and use a dir common to both, I would really prefer not to do that even if it works (as that directory in my local system would get filled up with files I don't want to exist outside of the container). Maybe what I have in mind just doesn't work? Some of this is definitely down to my two-day-old understanding of Prefect's architecture, but the real goal is just keeping absolutely everything related to Prefect in the docker environment except for flow development. If possible. As I said, not crucial as I have a totally working solution now and am eager to build some flows, but thanks in advance for any advice!
Hi @Nate Jahncke! I have a minimum working example of how to get
working with
here: https://github.com/kvnkho/demos/tree/main/prefect/docker_with_local_storage
I used an absolute path in the
to point to a location in the Docker container.
Hey thanks a lot @Kevin Kho! I had attempted to use a path internal to my agent container previously, but since that path doesn't exist for my local system, it failed. You're saying that if the flow is stored as script and run_config is set to DockerRun, it will not attempt to access that path locally and instead do all of its work in the container? What state does this leave the container in after flow execution? I'm at the end of a long day so I'm not going to experiment any more tonight, but I will check this out first thing tomorrow. Thanks a lot for your help!
For local debugging I was using
. For registering on the Cloud, yes it will access the path locally for the container. I think another option to consider is you can also mount a local drive when you make the container. Not quite sure what you mean about state of the container. Are you trying to keep track of a state?
Ah ok, thanks for clarifying. By state of the container I mean does it leave it running, or spin up a new container for every flow run. I'm attempting to work entirely within my own docker-compose environment. and am not using the Prefect cloud for anything at this point, so I'm not sure this will work for me. Ftr, I do mount a local volume for the container, but the path is of course still different. I'll see what happens if I create a path inside the container that matches my external path. That's worth a shot. Anyway, I'm working on an MVP and if we get funded we will almost certainly move to a managed solution, but until then we're managing everything ourselves to keep costs down. Really it seems like all I'm missing is the ability to configure a flow to use a specific container, not image, since I don't want Prefect to handle Docker for me, and then also configure storage for the flow within the specified container. It doesn't seem like that's possible at present, correct? As i said originally, it's not a big deal, as I can use an entirely local solution by registering the flow from within the container context via ipython. This is just the very beginning of my Prefect journey, and I'm sure my usage will evolve as my understanding does.
I believe the Github Folder I showed is what you’re looking for in terms of combining DockerRun and LocalStorage using absolute paths. If it’s easier for you, Prefect Cloud is 10000 free task runs per month no credit card needed. Might be enough for a POC.
The container is spun up for the flow run. I think the container should be stateless and files that are persisted should ideally live outside the container? S3 as an example.
Right, the container is stateless as everything is in the volume, but there's typically a lot of setup/teardown overhead with running/closing containers which is why we persist them, even if they are stateless. Not that it really matters honestly, we just do our dev work and little stuff in docker-compose, prod is kubernetes, so tbh I think I just got it stuck in my head that I could make it work the way I wanted to.
Good to know the cloud offers that many task runs, but 10k/month would be close to the wire for us. Lots of scraping going on.
(Not bad scraping...)
I see what you mean. So the setup is pulling the image from wherever. Prefect then adds a command to the image. If you already have the image on the agent, you shouldn’t have to pull it again (you can test with LocalAgent) unless there are changes to your flow. For temporarily spun up hardware to run flows in, it’s unavoidable to have to setup all the time.
Right, so that was my whole point, I already have an entirely stateless container running (
) that I want to run flows in. At present I can do that just fine, as long as I shell into the
container and register the flow from inside of it. This is not a big deal for me. My hope, however, was that there was some combination of storage and run_config classes I could use to allow me to specify the
container, not an image, to use as the host of the flow. Just to skip a step and therefore some potential downstream issues, but as I said, it's not a big deal the way it is right now.
Hey @Nate Jahncke -- this is a basic limitation of
. You may get some extra milage by setting
which I believe will ignore the path your flow is at during development/registration and just attempt to pull it from the given path at runtime. You'll still need to place the flow file at that path in the agent container (I'd recommend mounting it). Generally, you'd be much better off using a remote storage such as S3 or Github, local storage is basically designed for use with a local agent on the same machine to get people started.
Hey @Zanie, thanks for the help. I'm experimenting with script storage right now, and I think it'll do the trick (just mounting my local flows dir into the
container). We will definitely use remote storage outside of our development environments, but I think I have everything I need at this point to get everyone up and running. Thanks a ton to you and @Kevin Kho for all the help! You guys are great.
Wonderful! If you have the chance to share your completed solution somewhere I'm sure the community would appreciate it, it's a common ask to get development going locally fully in containers. You're welcome 🙂
Yes I definitely will. Thanks again!
Hey, for the record (and anyone in the same boat): that did the trick. The only way I could get the path to resolve properly at runtime is via inspect (
stack = inspect.stack()
). I would have thought
would work, but it doesn't during agent execution 🤷 Here's my test Flow config:
Copy code
# Configure Flow
with Flow(
) as flow:
    # Define Flow

# Register Flow
flow.register(project_name="default_project", labels=["default_label"])
container has its
dir mapped to my local flows dir via docker-compose:
Copy code
- ${PREFECT_FLOWS_DIR}:/flows/
Now I can register flows from outside the docker environment and the runs are picked up by my agent container exactly as I'd hoped. Thanks for all your help, guys!
Nice work @Nate Jahncke!
👍 1
@Marvin archive "Developing a flow using local storage and local docker agent"