Gabriel Milan
02/24/2022, 9:41 PMprefect
namespace and work fine. I needed to add one more agent to the cluster, but this one should be on another namespace, say prefect-agent-xxxx
. When I do this, I can successfully submit runs to it, they do get deployed, but it doesn't seem to actually run and no logs are shown. I've tried configuring the apollo URL to <http://prefect-apollo.prefect.svc.cluster.local:4200>
and also setting an ExternalName
to it in the prefect-agent-xxxx
namespace and using it, but none of them works. Any ideas on how I could debug this?Kevin Kho
02/24/2022, 9:53 PMGabriel Milan
02/24/2022, 9:55 PMKevin Kho
02/24/2022, 9:57 PMMatthias
02/24/2022, 10:52 PM--show-flow-logs
. It could give you more insights into the issueGabriel Milan
02/24/2022, 11:54 PMprefect agent kubernetes start --job-template {{ .Values.agent.jobTemplateFilePath }}
Matthias
02/25/2022, 6:11 AMGabriel Milan
02/25/2022, 11:40 AMMatthias
02/25/2022, 7:00 PMGabriel Milan
02/25/2022, 7:56 PMMatthias
02/25/2022, 8:00 PMAnna Geller
02/25/2022, 10:08 PMlabel
did you assign to that agent?
Your flow runs are correctly deployed and we can see that in the agent logs, so label shouldn’t be an issue, but still worth sharing that for debugging.
Q2: Can you inspect the flow run pods and check the logs there? You could check the pods in this namespace and inspect the Kubernetes jobs and pods deployed there. Are flow run and task run states getting updated in your Server backend? You could potentially check that in your Server logs somewhere.
Q3: You wrote “it doesn't seem to actually run and no logs are shown” - what doesn’t run? Do you mean you don’t see the flow run logs and updates being reflected in your Server UI?
Q4: Where did you configure your Server Apollo endpoint - did you set it in the agent manifest as env variable as shown here?
env:
- name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
value: ''
- name: PREFECT__CLOUD__API
value: "http://[prefect-apollo.prefect.svc.cluster.local](<<http://prefect-apollo.prefect.svc.cluster.local:4200/>>):4200/graphql" # paste your GraphQL Server
- name: PREFECT__BACKEND
value: server
Q5: How did you configure your flow runs that got deployed to this agent (KubernetesRun
)?
Q6: Didn’t you explicitly set the namespace when deploying the YAML file of the KubernetesAgent
?
Some immediate ideas to check/inspect or try:
I would recommend creating a manifest file using:
prefect agent kubernetes install --rbac > third_agent.yaml
Then adjusting the env variables as above and deploying it to a desired namespace this way:
kubectl apply -f third_agent.yaml -n yournamespace
Then, all the flow run Kubernetes jobs should also be deployed to this namespace.
Then, only networking and permission issues remains so that your flow run pods can talk to your Server in a separate namespace and your Service with ExternalName
seems like the right solution.
kind: Service
apiVersion: v1
metadata:
name: server-third-agent
namespace: yournamespace
spec:
type: ExternalName
externalName: [prefect-apollo.prefect.svc.cluster.local](<<http://prefect-apollo.prefect.svc.cluster.local:4200/>>)
ports:
- port: 80
- port: 443
- port: 4200
I’m particularly guessing here when it comes to ports - I have no idea which ports exactly would need to be open, but would like to open this up for discussion - it may be an issue with ports.
Can you then also check the logs of your Server components and this above service to check if you see any errors or missing permissions there?
Finally, I would check RBAC on your third agent. It may also be an issue of missing RoleBinding
to bind your third agent’s permissions to your Server’s namespace (or both namespaces):
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
name: prefect-agent-rbac
namespace: default # add your prefect-agent-xxxx here
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: prefect-agent-rbac
subjects:
- kind: ServiceAccount
name: default
Gabriel Milan
02/25/2022, 11:02 PMdatario
label
Q2: there are no logs whatsoever in the flow run pods. I can only see a change of state in the UI when the run is actually submitted, but nothing else. where could I get Server logs?
Q3: yes, and there're also no logs on the pods themselves
Q4: the only env that you've shown that is not set on my agent is PREFECT__CLOUD__AGENT__AUTH_TOKEN
, could that be a problem? all of my other agents work without it
Q5: I've set them using flow.run_config = KubernetesRun(image=constants.DOCKER_IMAGE.value)
. This constants.DOCKER_IMAGE.value
is a valid docker image, the same I'm using for other agents
Q6: I've deployed it by doing helm upgrade --install prefect-agent -n <namespace> <mychart> -f values.yaml
. The chart I'm using is this one and my values.yaml
file looks like this:
agent:
apollo_url: <http://prefect-apollo.prefect.svc.cluster.local:4200/>
env: []
image:
name: prefecthq/prefect
tag: 0.15.9
job:
resources:
limits:
cpu: ''
memory: ''
requests:
cpu: ''
memory: ''
jobTemplateFilePath: myjobtemplateurl.yaml
name: prefect-agent
prefectLabels:
- datario
replicas: 1
resources:
limits:
cpu: 100m
memory: 128Mi
serviceAccountName: prefect-agent
the job template looks like this
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: flow
envFrom:
- secretRef:
name: gcp-credentials
- secretRef:
name: vault-credentials
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /mnt/creds.json
volumeMounts:
- name: gcp-sa
mountPath: /mnt/
readOnly: true
volumes:
- name: gcp-sa
secret:
secretName: gcp-sa
and all of those secrets are properly configured.
Finally, I just wanted to add that I'll check those steps you've mentioned and get back asaplinkerd-await
for blocking on linkerd readiness. This third agent was deployed on a non-linkerd-injected namespace, thus "awaiting" forever on readiness. That's why our run pods would never die, show logs or update its state. After I've injected the namespace with linkerd, everything works. Thank you so much for the effort on understanding our scenario and all the help!Anna Geller
02/26/2022, 1:19 AMGabriel Milan
02/26/2022, 3:50 PM