[solved] When I create a flow that runs a namespac...
# prefect-community
s
[solved] When I create a flow that runs a namespaced job, prefect actually creates 2 jobs.
dummy
runs on the
aks-spot
node but
prefect-job
runs on the
aks-system
node (and I don’t want it running on the system node pool). Is there a way to configure tolerations and affinities for the
prefect-job
pod?
a
If you are running a Kubernetes agent, then each flow run is executed as a Kubernetes job (this should be the last pod I think). Then when you start a job using RunNamespacedJob task, it creates a separate job. So if you want to avoid spinning up 2 jobs and you can execute your logic within your Python flow definition, then you can get rid of the RunNamespacedJob task and run everything within the main flow run pod.
I’m no expert in that but you should be able to set a node selector for a job on your job template that you can set on KubernetesRun run configuration.
s
Ah okay. I would actually like the flow scheduling and business logic to be in separate repositories to 1. prevent sticky dependency situations 2. allow teams that use other languages to participate It looks like the documentation on a kubernetes run isn’t super clear on what it does. I’m not sure if I can specify a
job_template
and also pass in a
RunNamedspacedJob
task. Let me try doing that
Also I found your github repo for k8s, dbt, and snowflake very helpful. That’s exactly what I’m trying to setup here
👍 1
a
it’s definitely doable. The job template is part of run config and is completely independent of what tasks you run, whether it’s dbt or Kubernetes tasks. The template only determines how your flow run gets deployed as a Kubernetes job.
Copy code
flow.run_config = KubernetesRun(job_template_path="<s3://bucket/path/to/spec.yaml>")
s
Is the job template for kubernetes run supposed to be a fully fledged job template? Will the agent override some of the parameters and specify image, command, args etc ?
It doesn’t work if I pass in a custom template
I’d like to submit a feature request let KubernetesRun take
tolerations
and
affinity
as named parameters to fix this. I’ll try to find a way around this in the meanwhile. Dankeschön from Berlin 🇩🇪.
a
This is the default job_template - it’s not a fully fledged job definition, it’s just a template defining things like the container name (there must be a container named “flow” - this will be used for the flow run) + optionally you could set custom things like service accounts etc. Greetings from Berlin, too! 🇩🇪 (Gern geschehen!)
s
Das ist ja der Wahnsinn. Ich versuche das denn
a
If you want to submit a feature request, would you be willing to open a Github issue to discuss the problem and show why you can’t use the job template to customize the run config to your use case?
s
Already did 😄
👍 1
s
That’s me
a
“If I specify a 
job_template
 then I cannot use the 
RunNamespacedJob
 task.” - can you explain why?
Regarding “I would actually like the flow scheduling and business logic to be in separate repositories” - you can separate this by using e.g. a different Github repository for each and this is something you would use Storage for. Perhaps it would be easier to just separate the code?
s
can you explain why?
I actually looked into it, and it was because I didn’t specify a container name. When I fixed that it actually worked! This is the code snippet if anyone happens across this
Copy code
run_config_job_template = {
    "apiVersion": "batch/v1",
    "kind": "Job",
    "spec": {
        "template": {
            "spec": {
                "containers": [
                    {
                        "name": "flow"
                    }
                ],
                "affinity": {
                    "nodeAffinity": {
                        "requiredDuringSchedulingIgnoredDuringExecution": {
                            "nodeSelectorTerms": [
                                {
                                    "matchExpressions": [
                                        {
                                            "key": "agentpool",
                                            "operator": "In",
                                            "values": ["spotd2as"],
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                },
                "tolerations": [
                    {
                        "key": "<http://kubernetes.azure.com/scalesetpriority|kubernetes.azure.com/scalesetpriority>",
                        "operator": "Equal",
                        "value": "spot",
                        "effect": "NoSchedule",
                    }
                ]
            }
        }
    }
}

k8s_run_config = KubernetesRun(job_template=run_config_job_template)
a
Nice work and thanks for sharing! Is it OK to close the Github issue then, or do you want to keep it?
s
I will post this to github, but I would still like the feature request to stay since it eliminates boilerplate
👍 1
Schönen Tag ☁️
a
Ebenfalls! 🙂
z
Can you include this workaround / update your note on the
RunNamespacedJob
bit in your issue? I was quite confused by that part of your request.
upvote 1
s
shore
done
👍 1