Is there a good way make "quick run" do a quick ru...
# ask-community
h
Is there a good way make "quick run" do a quick run in Prefect Cloud? Seems like it only schedules.
n
It is scheduled for running immediately with quick run so a matching agent should pick it up the next time it polls Cloud (within 10 seconds)
h
It doesn't
It just says scheduled
n
I would assume it’s your label/agent config that’s incorrect then, otherwise a bug in cloud because what you want is what should happen
Not doing anything in particuar...
n
Quick run works well for me in cloud. Do you have agents with labels matching your flow?
upvote 1
👀 1
h
Probably not
Trying again
n
Here’s the docs for that
h
It keeps registering my laptop name as a label?
n
Did you give any args.labels? I think that might be the default if you don’t set any labels
h
Copy code
parser = ArgumentParser(add_help=False)
    parser.add_argument(
        "--debug",
        default=False,
        required=False,
        action="store_true",
        dest="debug",
        help="debug flag",
    )

    subparser = parser.add_subparsers(dest="command")
    register = subparser.add_parser("register")
    run = subparser.add_parser("run")

    register.add_argument("-c", "--commit-ref", dest="commit_ref", type=str, required=True)
    register.add_argument("-p", "--project-name", dest="project_name", type=str, default="dbt")
    register.add_argument("-l", "--labels", action="append", default=[])
    register.add_argument("--build", dest="build", action="store_true", default=False)

    run.add_argument(
        "--run-on-schedule", dest="run_on_schedule", action="store_true", default=False
    )
    run.add_argument(
        "--basepath", dest="basepath", type=str, default=path.dirname(path.realpath(__file__))
    )

    args = parser.parse_args()
Defaults to empty
Calling with
--labels prod
n
What are the labels in cloud? Is it set to [prod, laptop-name]?
h
Yes
n
Could be that it always defaults to that when doing local stuff, haven’t tried local agent/cloud enough. I think the easiest would be to add laptop name in your local agent since I assume you only have prod right now
(you have docker agent not the actual prefect local agent but it’s still local on your machine)
h
I can't add my laptop name in CI/CD and in prod because my laptop isn't there, and others have to be able to deploy
n
But you’re using local storage? This has to run on your machine?
h
No
It's running inside GKE in a container
n
And if using GKE you need a KubernetesAgent
I know it does
I was just recommended using it, because I'm building a docker container with it
So I want prefect to just load data from there
n
Huh, ok this is some advanced stuff. 😮 Well ok, so you’re getting your laptop hostname when registering from there because of the local storage. You should be able to omit that with
add_default_labels=False
kwarg at least https://docs.prefect.io/api/latest/storage.html#local
upvote 1
I don’t understand how the Docker run config works on an agent on GKE, would love to hear if you accomplish what you want
a
@Noah Holm you are 100% correct that hostname is attached as a label, when local storage and local agents are used. This default label is extremely useful, because this flow is then stored on a specific machine and thus it can only run successfully there. And you are also right that when a FlowRun gets stuck in a Scheduled state, it’s most likely due to: • agent misconfiguration, i.e. labels not matching between agent and flow. • agent is not healthy • or the API key is expired @haf did you manage to solve the issue?
h
I'm not sure how this is complex; isn't the default that people run their stuff in the cloud on k8s? 😉
🤦‍♂️ 1
And using containers?
Tried without the auto-labels now
ok, so DockerRun doesn't work with k8s
Love the name
vegan-bear
😂 1
Nope, no cheese
Failed to load and execute Flow's environment: ValueError('Flow is not contained in this Storage')
a
@haf many users run Prefect flows on K8s. This error is much more informative! It looks like run config is working, but storage doesn’t. Can you share the flow code that led to this error?
h
Copy code
if args.debug:
        prefect.config.logging.level = "DEBUG"

    if args.command == "run":
        prefect.context["basepath"] = args.basepath
        print(f"prefect.context.get(basepath)='{prefect.context.get('basepath')}")
        flow.run(run_on_schedule=args.run_on_schedule)

    elif args.command == "register":
        image = f"europe-docker.pkg.dev/logary-delivery/cd/data-pipelines:{args.commit_ref}"
        print(f"Registering flow with labels={args.labels} image={image}")

        flow.schedule = IntervalSchedule(start_date=at_night(), interval=timedelta(hours=24))
        flow.storage = Local(
            path="/app/flows/run_mmm.py",
            stored_as_script=True,
            add_default_labels=False,
        )
        flow.run_config = KubernetesRun(
            image=image,
            labels=args.labels,
        )
        flow.register(
            project_name=args.project_name,
            build=args.build,
            idempotency_key=args.commit_ref,
            labels=args.labels,
            add_default_labels=False,
        )
a
I see, so if you are using KubernetesRun, you should use a non-local storage, because your Kubernetes pod can’t access your local resources, since they reside outside of the pod. You could try one of the following: • one of Git storage classes: Git, GitHub, GitLab, Bitbucket - depending on what you use as your VCS - as long as you have a Personal Access Token stored as Secret, your Kubernetes job running your flow will be able to grab flow code from there • one of Cloud storage classes: S3, GCS, Azure - this would also require authenticating to the specific cloud provider to be able to grab the flow code from there
you mentioned you use GKE, so GCS bucket would be potentially a good storage mechanism for you: https://docs.prefect.io/orchestration/flow_config/storage.html#google-cloud-storage
h
I want to access local things in the image
I don't want to separate the code from the binary artefact that is scheduled
a
Then it would make sense to use Local storage and Local agent - this way you have access to all local resources for development. And once you are ready to deploy it so that it runs on schedule, you can package your dependencies into a container image so that the container can be spun up anywhere - any Kubernetes cluster on any cloud. Basically, in order for your Docker image to run reliably on GKE, everything that your code needs must be within the Docker image, so no dependency on local resources. Does it make sense?
h
Yes it makes sense and this is exactly the way I've done it
Except "local agent"
upvote 1
I can't schedule things on Prefect cloud with "local agent"; it'll complain
a
you absolutely can schedule things with Local Agent! 🙂 I’m sure we can sort it out
h
But the problem isn't the agent, the problem is that the runtime can't find the flow?
a
the easiest thing to do is to start the local agent from the terminal in the default configuration:
Copy code
prefect agent local start
Then, you don’t even need to pass any storage or agent, because Local storage and agent are the defaults:
Copy code
# hw_flow.py
from prefect import task, Flow


@task(log_stdout=True)
def hello_world():
    print("hello world")


with Flow("idempotent-flow") as flow:
    hw = hello_world()
Then, you can use the CLI to register your flow to the Prefect Cloud:
Copy code
prefect register --project YOUR_PROJECT_NAME -p hw_flow.py
just following the defaults, you should be able to see this flow in your Cloud account, as long as you authenticated your local machine with the API key:
Copy code
prefect auth login --key "YOUR_KEY"
h
Yes I've tried local agent before and it worked the way you mention
But I don't want to use the local agent, I want to use the k8s agent?
a
this is the deployment stage. I thought we are still in the development phase 🙂
h
No, I can run it locally just fine
Except that I can't "schedule it"
Do you mean I should debug with local agent and not k8s agent?
a
ok, if you are ready for deployment, you need to have a look at your dependencies: do you use some 3rd party libraries or custom modules that your flow needs?
h
The feedback loop is actually pretty small
yes I do
A lot of them
so I use pipfile and have set up a docker container for it
a
Then those need to be installed in the Dockerfile
h
and that has all the flows in it, I've verified that.
Copy code
$ docker run --rm -it europe-docker.pkg.dev/logary-delivery/cd/data-pipelines:xxx
            _____  _____  ______ ______ ______ _____ _______
           |  __ \|  __ \|  ____|  ____|  ____/ ____|__   __|
           | |__) | |__) | |__  | |__  | |__ | |       | |
           |  ___/|  _  /|  __| |  __| |  __|| |       | |
           | |    | | \ \| |____| |    | |___| |____   | |
           |_|    |_|  \_\______|_|    |______\_____|  |_|

Thanks for using Prefect!!!

This is the official docker image for Prefect Core, intended for executing
Prefect Flows. For more information, please see the docs:
<https://docs.prefect.io/core/getting_started/installation.html#docker>

root@514926ea8f6a:/app# ls
Pipfile  Pipfile.lock  dbt  dbt_project.yml  flows  infer  packages.yml  postinstall.py  profiles.yml
root@514926ea8f6a:/app# ls flows
__pycache__	   exchange_rates.py		      run_mmm.py
dask-worker-space  exchange_rates__insert_rate.sql    run_mmm__metrics_eligible_channels.sql
dbt.py		   exchange_rates__missing_dates.sql  run_mmm__revenues_eligible_apps.sql
root@514926ea8f6a:/app# cd flows
root@514926ea8f6a:/app/flows# l
bash: l: command not found
root@514926ea8f6a:/app/flows# pwd
/app/flows
root@514926ea8f6a:/app/flows# exit
logout
a
The thing is: you don’t need a virtual env in the Docker image - this is basically a replacement for it. So I would honestly get rid of pipfile and specify everything you need in plain and simple Dockerfile
h
Well...
I still need to do a pip install
a
can you share your Dockerfile?
h
and I still need to freeze my versions of the deps
Copy code
FROM prefecthq/prefect:0.15.4-python3.8

RUN pip install --upgrade pip setuptools wheel twine \
    && pip install pipenv \
    && apt-get update \
    && apt-get install -y --no-install-recommends curl gcc python3-dev libssl-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY Pipfile* packages.yml profiles.yml .user.yml .python-version dbt_project.yml postinstall.py ./
COPY infer ./infer

RUN PIPENV_VENV_IN_PROJECT=1 pipenv install --deploy
ENV PATH="/app/.venv/bin:$PATH"

RUN python postinstall.py

COPY flows ./flows
COPY dbt ./dbt
That's why I use pipfile
a
yes, but we can simply do that:
Copy code
RUN pip install -r requirements.txt
h
I don't have requirements.txt
I have a pipfile because it makes it more consistent
I can generate requirements.txt
but this isn't the problem
I can run things in this virtual env, as shown by running
python postinstall.py
And I'm not crashing on "module does not exist" anymore
a
awesome. I think what is missing now is only Storage then
h
I have a pipfile because it makes it more consistent
If this really is my issue then I'm happy to discuss the hows and whys of this, but I don't think it is
a
basically, you need to tell your Flow where to find your flow code
got it. Yeah I think you totally crushed it with the dependency management in your Dockerfile, and this is not an issue. The only thing missing is building a Storage for your flow code - Can you try a GCS bucket? Otherwise, if you want to keep it all in your Docker image, you can use Docker storage and ensure that you copy your flow code to the image:
Copy code
COPY /path/to/your/flow.py .
h
I don't want to use GCS because I want to keep the deployable artefact self-contained.
I don't want one run to fail one day but work the other or vice versa.
a
I totally get it, I think Docker storage is probably the easiest since you already are building a Docker image - I have some docs for you to get started, I think you know enough to manage this, but LMK if you face any issues along the way: • https://docs.prefect.io/orchestration/recipes/configuring_storage.htmlhttps://docs.prefect.io/api/latest/storage.html#docker
h
Yes, I'm facing issues along the way 🙂
Like you can see, I'm already copying the flows
a
Can you try copying only the flow .py files to the WORKDIR?
Copy code
COPY flows .
h
yes sure
Just one thing: you recommend Docker storage, but since I'm doing my own Dockerfile this is not supposed to be required, and also / furthermore, Docker storage will pickle your flows
When you spawn pickled flows, it crashes (because it doesn't pickle source dependencies for a completely unknown other reason), so that's why I'm not using Docker storage. Just to be clear 😉
So now you're thinking it might be a problem that it'll only look in the current dir?
Despite me giving a fully qualified path?
a
you can pass
stored_as_script=True
in Docker storage. I’m not really recommending Docker storage, I think it would be easier with GCS 🙂 but this was your preference, and Docker storage is easiest to get started because you can pass your Dockerfile and it will be built anytime you register your flow so that you can ensure all dependencies are baked into the image You need to use Docker storage, not local storage - remember the links to the docs I sent you?
h
Yes I remember the docs you sent me.
But to start out I don't want to build my dockerfile every time I register a flow, because that makes it impossible to tag it properly.
Secondly, if you look at the code, I've already packaged the image, so I don't need to rebuild it.
Thirdly, the Docker storage by default tries to itself add files into the image, which is superfluous for me since I've already built the image
This means I'd end up with
Copy code
flow.storage = Docker(
            path="/app/flows/run_mmm.py",
            image_name=image_base,
            image_tag=args.commit_ref,
            stored_as_script=True,
            add_default_labels=False,
        )
Right?
So while this all registers; I'm still getting
Failed to load and execute Flow's environment: ValueError('Flow is not contained in this Storage')
a
Sure, you can try it. I haven’t done it myself so it’s hard to help - but I will try! Overall, I believe the safest option is to match storage and run configuration in a way: • LocalRun + Local storage • DockerRun + Docker storage And once you move to Kubernetes, you can package your flow dependencies into a Docker image that will likely remain mostly static in contrast to your flow files. You then pass this image to your KubernetesRun. And you can use Cloud storage or Git storage classes as a Storage to ensure that even if something changes in your flow file, you don’t have to rebuild the image - the FlowRunner will grab the latest version of your flow code at runtime. But you can really mix and match it as you wish. There are so many possibilities! 🙂 You can find some of those here.
h
I have tried it
That's why I'm asking for help now
It's not working
I have moved to k8s
I have a dockerfile
and no, the deps aren't static, they also change
I want to package them together with the code.
I want to rebuild the image (called DevSecOps)
I know pretty well what I want.
What I don't know how to make Prefect do, is to pick up the flow file from the docker image.
Sorry if this sounds "angry", but it's just where I'm at right now. I know you're helping me, but "safest way" to use "LocalRun+LocalStorage" doesn't make any sense when I'm telling you explicitly I want to run this on k8s
a
@haf in that case Docker storage is ideal because the image gets rebuilt any time you register your flow. Do you have by any chance a Flow and configuration I could use to reproduce your issue? like one complete example that you could share so that I can run it on my machine and try to reproduce and identify the issue
h
Yes, that's probably what I should do next.
Let me work on that 🙂
a
that would be awesome! Thank you! 👍
h
Docker storage is ideal because the image gets rebuilt any time you register your flow
No, it's not ideal, I don't want this
because:
Prefect uses
requirements.txt
but doesn't support editable dependencies
So either I set up and host my own pypi, which I'm not doing right now
or I collapse everything into a single huge file (which I had before, but when you make more than one flow, this is unmaintainable)
OR I publish the package, which I'm not doing
HENCE, if I could make the prefect agent just call a python file, and let python handle everything related to packaging I'd be so happy 🙂
Also because Prefect's docker storage will pickle your classes and this doesn't support editable deps
if you look at the logs from the docker storage you'll see
.prefect
files are being added as part of the build; AFAIK these are the pickled files
a
I’m sure you can build your package inside of the Docker image. You would need a setup.py and then
Copy code
RUN pip install .
h
pip install -e .
yes
a
not sure what would be an editable dependency in a Docker image - once the image is built, no one will edit it any more, what am I missing?
h
But
pipenv
does this as part of
pipenv install --deploy
What you're missing is that when you do
pip install .
you're building a
.egg
file, but when you do an editable install you're not.
Instead, you add the path of the module to PYTHONPATH and that lets you use the code (AFAIK)
That said, perhaps a solution is indeed to locally build .egg files and have DockerStorage pickle them
no... wait, that failed because when you register, it has to be done with
flow.register()
Let me see if I can come up with a repro
What you're missing is that when you do 
pip install .
 you're building a 
.egg
 file, but when you do an editable install you're not.
Or in other words, "editable" just means "load files from disk at runtime" while the
pip install .
means load it from the egg file.
a
sure but who will edit the package inside of a Docker image? I don’t understand why would I want a package to be editable in a Docker image? What build artifact it generates (whether an .egg or wheel file) is an implementation detail - Docker image is a packaging mechanism by itself. The result will be the same: a package is installed in the env so that it can be used by Prefect flows, right?
h
You're correct that noone "will edit" it, but it's more about the package lookup mechanism that Python uses than the use-case for the programmer.
a
ok, we’re on the same page about that.
h
Unless you're packaging the package and installing the package from source while you're building the dockerfile, you don't have the package in PYTHONPATH
All of this is implementation details
It would be neat if the ValueError that Prefect throws, e.g. would tell you about where it looked and its current working directory, and/or its PYTHONPATH
Docker image is a packaging mechanism by itself
Yes, but not "package" as in "python package"
a
sure, let’s focus on the flow code so we can reproduce. I think your dependencies are fine, only Storage of flows needs further introspection 👍
h
The result will be the same: a package is installed in the env so that it can be used by Prefect flows, right?
But yes, with the Dockerfile I posted the python packages can be referenced by Prefect flows; and this is what is needed
Right now, the problem is just pointing to the flow file on disk
I have to eat a bit though 🙂
a
As mentioned before, GKE cluster doesn’t have access to a local path on your computer. I don’t think this is a good path to follow. Either copying all local dependencies into a container image and pointing the Docker storage in the flow to this path in container OR leveraging other remote storage mechanism such as GCS which you decided you don’t want.
I think you meant the same - a path in the image - but you only said local path
h
Again I’m not using GCS so let’s stop talking about that please.
a
Enjoy your meal and take your time to build a reproducible example! 🙂
h
I’ll try a few different permutations soon :)
a
sure, as I mentioned, this is not the path we follow here
h
Well it’s not a good path to follow ;) Everything is moving towards immutable infrastructure and reproducible builds.
Finally running!
a
Nice work! If you want to share your solution, you are welcome to do so.
🙌 1
h
Yaks shaved: • the
tini
entrypoint which threw away all ENVs • this meant
pip
ran with the system pip, not the venv pip • you're right that it would have been better to install with requirements.txt — but only as-so-far that pipenv doesn't tie into
pyproject.toml
which seems to be "the way" nowadays after PIP https://www.python.org/dev/peps/pep-0518/
The biggest problem, undocumented that had me reading code was this:
Copy code
docker = Docker(
                path="/app/flows/run_mmm.py",
                registry_url="ex/cd",
                dockerfile="Dockerfile",
                image_name="data-pipelines",
                image_tag=args.commit_ref,
                ignore_healthchecks=True,
                stored_as_script=True,
            )
            docker.add_flow(flow)
            flow.storage = docker
Because without explicitly
add_flow
it just crashes with the message I showed you before.
Also, this really needs docs.
a
Thanks for sharing @haf. So far it’s documented here.
h
I don't see any "ignore_healthchecks", and those docs "build" the storage, and there are no local editable deps there and there's no mention of building docker containers there either
so it's not.
those docs presume I use python to build the container
a
thanks for your feedback @haf, I will look into how we can include that.
h
I think it might be worthwhile to look into docs specifically on building and maintaining prefect as part of a CI/CD / devsecops pipeline and what invariants / requirements this would bring with it.