I am trying to import pandas. I have prefect cloud...
# ask-community
j
I am trying to import pandas. I have prefect cloud, AKS agent. I have modified the yaml file for pod to include pip install pandas but when I run the flow it can't find the pandas package. When I log onto the pod where the agent is I see that it is installed when the pod gets created by Kubernetes.When I run the flow I get: Failed to load and execute Flow's environment: FlowStorageError('An error occurred while unpickling the flow:\n ModuleNotFoundError("No module named \'pandas\'")\nThis may be due to a missing Python module in your current environment. Please ensure you have all required flow dependencies installed.')
k
Hey @Jai Deo, can I see how you did it in the yaml file?
j
Hi Kevin, this is the snippet:
containers:         - name: agent           image: 'prefecthq/prefect:0.15.3-python3.8'           command:             - /bin/bash             - '-c'           args:             - >-               cd /mnt/azure ; pip install  pandas ; prefect agent kubernetes               start           env:             - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN               value: JU13Ghv6WSDJNprenBiO3Q             - name: PREFECT__CLOUD__API               value: 'https://api.prefect.io'             - name: NAMESPACE               value: default
k
I think this is for the agent right? But then the pip install won’t carry over to the pod that is running the Flow. I assume you are using
KubernetesRun
in the Flow? What image does that use? That one would need Python.
j
The line near top says what the image is
prefecthq/prefect:0.15.3-python3.8'
Yes this for theagent - I though the yaml file uses the image to create the container and then the pip install
k
I know but this is the container for the agent specifically. This is not the same one your Flow will run in because Kubernetes Agent plus Kubernetes Run still create a new Kubernetes Job to run the Flow in.
j
My understanding is all wrong then - do you have an example I can use
for the container creation for the flow
I am using Azure to store the flow
k
Are you using
KubernetesRun
like this ?
flow.run_config = KubernetesRun(image="prefecthq/prefect",env={"SOME_VAR": "value"})
j
Yes
flow.run_config=KubernetesRun( image="prefecthq/prefect:0.15.3-python3.8",
k
That is the image the flow will run on. It’s separate from the agent container so that is the one that needs Pandas.
j
So the image name I am using in the run_config should not be the one I am using - it should be something different ?
k
They can be the same but they can also be different. These are two different containers yep.
j
When I save the flow - does my yaml deploy get created ? or do I have to create one from scratch ?
k
You mean the
job_template
for the Flow run or the one to spin up the agent?
j
The agent spins up ok.
Do I need a separate pod for the flow ?
k
The flow creates a new job which creates a new pod.
j
What name do I use for the image - do I need to specify a registry ?
k
Same as regular DockerHub. So if it’s in DockerHub it would just be like “prefecthq/prefect” but if it’s somewhere else you need to prefix like “ecr:….”
j
Our flow is using Azure storage and AKS
We don't use dockerhub
whne I use prefecthq/... I get the error
Kubernetes Error: rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/prefecthq/example200:latest": failed to resolve reference "docker.io/prefecthq/example200:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
k
What do you use for storing images? Azure container registry?
j
Azure storage
azure stoarge account
we define the container in the flow along with the connection string to that container
k
Can I see the code for that? Just remove sensitive info
j
flow.storage=Azure(container="test99", connection_string="DefaultEndpo....")
And I can see it listed when I log onto the storage account
k
As I was typing this up, I realized there is an easier solution. You can do
Copy code
from prefect.run_configs import KubernetesRun

flow.run_config = KubernetesRun(env={"EXTRA_PIP_PACKAGES": "scikit-learn matplotlib"})
so you can install pandas this way. Will continue the explanation though. ==================================================== I think we’re confusing the container term here. I believe the Azure Blob Storage calls their storage unit a container. This container is a separate concept from the Docker container. So with this
flow.storage = …
, you are taking a Flow and storing it as a blob inside the Azure Blob Storage Container. This has nothing to do with Docker. So when I’m asking about where you store your containers, I meant to say the Docker images. In Azure, there is a service called Azure Container Registry where you build images and then push it up there. If you have something like:
Copy code
flow.storage = Docker(registry_url="<my-registry.io>", image_name="my_flow")
This will build your Flow in a container and upload it to the registry specified. Then when you do
KubernetesRun
, it will automatically grab that image that was built if you don’t specify any. ==================================================== So that
Storage
in Azure, is separate from the container that will run the Flow. Prefect will pull the Flow from there, and then run it on top of the Docker container specified in the
KubernetesRun
. So you can build a Docker image with all of your dependencies, upload it to Azure Container Registry, and then specify that in
KubernetesRun
and your flow will run on top of that.
j
I will both methods a try - I should be using the second method as I need to add a few more commands into the container. I will use the extras utility first.
Thank you very much for your help - it was very educational. I will let you know tomorrow of the outcome
k
Sounds good! Yes
EXTRA_PIP_PACKAGES
is not meant to be for production but is easier for development.
j
The pandas is working fine now. But I am trying to use an image to create the flow but I get the error on authorisation. Is there a way to supply the username and password in the env to pull the image from the registry:-
State Message: Kubernetes Error: rpc error: code = Unknown desc = failed to pull and unpack image "prefregistry.azurecr.io/mstools:v1": failed to resolve reference "prefregistry.azurecr.io/mstools:v1": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized