I have registered a flow in prefect cloud and I am...
# prefect-community
j
I have registered a flow in prefect cloud and I am trying to run it in Kubernetes (AKS). Right now I can see that the Kubernetes agent submits the flow for execution and the first task in the flow seems to execute fine until it errors with this message. If I run the flow locally all is fine. Has anyone faced the same problem?
a
The line that is failing for you seems to be about your request session and Azure AKS has quite complex networking setup. 1. Can you share how did you start your agent and what's your KubernetesRun configuration? 2. Is your flow doing some long-running or memory intensive processing?
If you could share more (as much info as possible) about your setup and flow that fails, I could open a Github issue and try to reproduce. But often session issues like this are transient
j
1. This is how the agent is started
1. This is the job template passed to KubernetesRun
2. The flow is not doing any long running or memory intensive tasks. The task that produces this error still manages to execute and fetch the data it should but errors at the end. The tasks in the flow are 1) Fetch data from an API to local files 2) Upload data to blob…. Since the first tasks executes but errors at the end, it fetches data but the flow does not proceed.
a
interesting, thx for sharing more info! not sure if this may be part of the issue but it looks like your agent is deployed to a default namespace while your jobs are deployed to a "dev" namespace - is this intended? I also noticed you didn't assign any label to your agent - we recommend doing that. Do you have any custom dependencies here? if you could share your flow I could see how I can dig deeper here. You could sent it to me via DM for privacy if you don't wanna share here. Similarly can you send me the flow run ID? or the entire URL that you get to the flow run of this flow
j
the agent and the jobs are deployed into the same namespace
a
this command will deploy it to "default" namespace, I'm quite sure
Copy code
prefect agent kubernetes start -e AZURE_STORAGE_CONNECTION_STRING=$AZURE_STORAGE_CONNECTION_STRING
j
right, but that is just the command line present in the Deployment
a
you would need to do:
Copy code
prefect agent kubernetes start -e AZURE_STORAGE_CONNECTION_STRING=$AZURE_STORAGE_CONNECTION_STRING --namespace dev
j
I have defined the namespace in an env variable in the Deployment template
I can confirm that the agent is not running on default namespace
a
the template is for the flow run pod, not for the agent
j
when you run
Copy code
prefect agent kubernetes install -k API_KEY
this generates the template for the Deployment which I have used to deploy the agent. In here I have added the env variable for namespace
I have added the --namespace parameter when starting the agent, still having the same error. Can it be there is some env variable missing that would be needed so that “config” is available on prefect context?
a
Gotcha, I usually do it the same way, but instead of "hardcoding" the namespace within the YAML file, I usually provide it to the kubectl command:
Copy code
kubectl apply --namespace=YOUR_NAMESPACE -f agent_config.yaml
j
Yes, in my case I am deploying the agent using Pulumi (IaC), there is nothing hardcoded in a file :)
a
anyway, perhaps namespace is not the issue here? to dive deeper we would need to collect more info since flow run logs don't give a 100% clear indication of the root cause - from the logs it seems like a transient issue with the session of an API request to Cloud
j
namespace seems to not be the cause of the problem, agree
a
thanks for sharing your Flow via DM - based on the flow structure it looks like you call "normal" Python code within your Flow. In Prefect 1.0, within the Flow, you can only call Prefect tasks and the Flow is just a placeholder for tasks based on which Prefect <= 1.1 generates a DAG. Some notes on the flow you shared: • The azure and mambu don't seem to be valid Prefect tasks at a first glance. • you are setting environment variables within your Flow block which will likely lead to unintended consequences since everything in the Flow block is only evaluated at build time (uring flow registration), not at flow runtime. • you are globally defining an object called "edpoint" which you are passing to downstream tasks, this also may be problematic since things like HTTP endpoints cannot be pickled by cloudpickle which is used as default storage. It's best practice to define database connections and HTTP endpoints directly within your Prefect tasks, i.e.. functions decorated with
@task
Can you share your storage definition? also you can share via DM. Generally I think the issue is in your Flow structure which is why the flow run gets properly picked up by your Kubernetes agent but then dies after executing the first task
thx for DMing your storage - I would try to refactor the flow a bit to ensure that the Flow block only calls Prefect tasks and creates a DAG, then reregister the flow, upload the flow to Azure blob storage and try to rerun it.
👍 1
j
I will do some testing modifying the flow
The strange thing is that it all runs fine locally and deploys/registers to prefect cloud without a problem
I tested creating a basic flow to test and it ran fine, it seems like it must be something with how the flow is defined like you mentioned
a
That's a brilliant idea! It's always great to test infrastructure issues like this using a simple hello-world flow doing basically nothing but testing that the storage and run configuration work fine, and then introducing all the tasks one by one to find out which task causes the issue.
j
Apparently the issue seems to have solved itself after I bumped the version of prefect 1.0.0 -> 1.1.0 for the image that we were using to run the flows.
🙌 1