Hey Everyone, *CONTEXT* I have a flow deployment ...
# ask-community
b
Hey Everyone, CONTEXT I have a flow deployment using K8 job infrastructure. When the job pod is created in our cluster, the pod is assigned ephemeral storage. The flow is downloading a 14 GB zipped file. When the file is unzipped, it is 50 GB. The ephemeral storage assigned to the pod runs out of space, and the worker node kills the pod. Screenshot of issue below QUESTION Is there a way to define the storage limit of job pods using prefects deployment yaml?
🙌 1
c
You can specify job variables / overrides
I think all of our examples / docs focus exclusively on cpu / memory kind of settings. For storage like this I’d anticipate you would want to attach a separate volume
k
are there any examples for doing that in a
prefect.yaml
? I've got mine like this under the work pool but I'm not seeing the requests on the job:
Copy code
job_variables:
  customizations:
     - op: add
       path: /spec/template/spec/containers/0/resources
       value:
         requests:
           cpu: 0.25
           memory: 300mi
         limits:
          cpu: 0.25
           memory: 300mi
k
it is present under infra overrides on the deployment though
c
There are some examples , I might need to find them though - we had some existing examples in docs for overrides, if they aren’t there for prefect deploy docs I can get them in
🙌 1
k
aha! following the guide here, I added a
resources
key to the manifest in the work pool template like so:
Copy code
"job_manifest": {
      "kind": "Job",
      "spec": {
        "template": {
          "spec": {
            "containers": [
              {
                "env": "{{ env }}",
                "args": "{{ command }}",
                "name": "prefect-job",
                "image": "{{ image }}",
            --> "resources": "{{ resources }}",
                "imagePullPolicy": "{{ image_pull_policy }}",
then add the resources values to my
prefect.yaml
like so:
Copy code
definitions:
  work_pools:
    central-cluster: &central-cluster
      name: central-cluster
      work_queue_name: default
      job_variables:
    --> resources:
          requests:
            cpu: 250m
            memory: 300Mi
          limits:
            cpu: 250m
            memory: 300Mi
🙌 2
1
c
This is excellent - you didn’t find this in our docs right? I’ll work on getting a PR in to add this, so thank you for your help here!
although it is part of the worker…hrmmmm.
k
yeah, there's not an end-to-end example of how to do it, more like I had to piece together how to do it from a bunch of different guides. the info is all there, just scattered
1
c
Just out of curiousity, if it was consolidated to one place, where would have been the go-to for you?
k
good question. it's about work pools and deployment management, but the specific k8s worker example lives in other docs. I get that it's a different package, but it seems like from people's posts here that it's also a common place to run deployments
🦜 1
c
For sure, that’s good feedback! Thank you!
k
In addition, the reason I initially thought
customizations
in my yaml file was the way to go was because that worked in the past on a k8s job infra block, and it's in the docs near references to infra overrides, but that doesn't appear to be supported with the k8s worker. Which makes sense since we can easily customize the job template ourselves, but it's one of the few things about going from the infra block to a k8s work pool that doesn't line up.
🙏 1
b
Thank you both. 1. are there docs that explain what path is used in the docker container when flow code is downloaded from storage and executed? ( I'm assuming its the default working directory specified in the docker image? ) 2. If we decide to create a Persistent Volume with more storage https://kubernetes.io/docs/concepts/storage/persistent-volumes/ and mount that volume to our container running our flow, would we need specify the mount path as the destination for where our flow code is downloaded from storage and executed? 3. To rephrase question 2, how do we get the flow to run inside a mounted volume? does the working directory and the mounted volume need to share the same path? or is that the wrong approach?
c
1. Yes, the working directory is
/opt/prefect
by default, and path + entrypoint specify what is executed (which can be automatically populated through
prefect deploy
2. THe mount should occur when the pod spins up, so it would just be an attached volume at that mount point instead of the underlying container one. Really, this is entirely your prerogative of where to mount it. E.g. if you want to extract to /mnt/abcd then just ensure
/mnt/abcd
is already mounted before hand 3. A mount volume is kind of irrelevant - if you mount it to a mountpoint, it’s completely invisible and transparent to prefect
for example if you have
/opt/prefect
(which is the default path in the container) and you mount a volume on
/opt/prefect/flows
, its entirely transparent by design with Linux that you just transitioned to a different fs
To better answer your question: You can keep the default working directory as
/opt/prefect
and mount your volume to something like
/opt/prefect/attached
or however you want to naming convention it. Then your pull step happens to bring your code in (either to the local fs, or the attached one), and does the thing to extract, where the only requirement is to extract to the attached volume with enough space
🙌 1