<@ULVA73B9P> Can you explain the following flow er...
# ask-marvin
c
@Marvin Can you explain the following flow error?
Copy code
Finished in state Failed("Flow run encountered an exception: ProgrammingError: 255002: Optional dependency: 'pandas' is not installed, please see the following link for install instructions: <https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#installation>")
m
thought for 34.7 seconds
Short version: your flow called a Snowflake method that requires pandas, but pandas is not installed in the environment where the flow actually ran. The Snowflake Python connector raises ProgrammingError 255002 in that case, and Prefect reports the flow as Failed. Common calls that trigger this: -
cursor.fetch_pandas_all()
or
fetch_pandas_batches()
-
snowflake.connector.pandas_tools.write_pandas(...)
- Snowpark
DataFrame.to_pandas()
- SQLAlchemy Snowflake helpers that return DataFrames How to fix - Install the pandas extra for the Snowflake connector in the runtime environment:
Copy code
pip install "snowflake-connector-python[pandas]"
This pulls in pandas (and related deps like pyarrow) that those methods need. - If you’re running in a container/worker image, add it to the image your deployment uses (e.g., in your Dockerfile):
Copy code
RUN pip install --no-cache-dir "snowflake-connector-python[pandas]"
- If you manage dependencies with requirements/pyproject, add:
Copy code
snowflake-connector-python[pandas]
Reference: Snowflake pandas integration installation If you’re not explicitly using pandas in your code, it’s still likely one of the above Snowflake helpers under the hood. If you share: - where the flow runs (local, Docker/K8s work pool, etc.), and - the Snowflake call you’re making, I can suggest the exact dependency change and where to put it.
c
@Marvin I am still getting the same flow error. This is what my docker file looks like:
Copy code
FROM prefecthq/prefect:3.2.14-python3.9
COPY . /opt/prefect/Scripts/
WORKDIR /opt/prefect/Scripts/
RUN pip install --no-cache-dir "snowflake-connector-python[pandas]" adlfs s3fs requests pandas pyodbc snowflake prefect_azure && apt-get update && apt-get install -y unixodbc
m
thought for 327.3 seconds
Thanks for sharing the Dockerfile — that helps. Given you’re still seeing “Optional dependency: 'pandas' is not installed” at runtime, the most common causes are: 1) The flow is not actually running in your custom image - If your deployment/worker is using the default Prefect image (e.g., prefecthq/prefect:3.2.14-python3.9), it won’t have pandas or the Snowflake extras you installed. - Check the image used by the run in the Prefect UI: - Open the flow run → Infrastructure panel → confirm the exact image name used. - If it’s not your custom image, point your deployment to it and rerun: - In Prefect 3.x Python API:
Copy code
from prefect import flow
    from prefect.deployments import deploy

    deploy(
      "path/to/flow.py:my_flow",
      name="my-deployment",
      work_pool_name="my-docker-pool",
      image="your-registry/your-image:tag",
      build=False,  # image already built
      push=False,   # image already pushed
    )
- Or set the “image” in your Docker work pool’s job variables or in your prefect.yaml and redeploy. - Make sure your Docker worker is running and attached to that pool. Docs: Use Docker with Prefect work pools 2) You’re running on a non-Docker worker - If you’re using a Process/Kubernetes/other pool, the Dockerfile won’t affect the runtime environment unless the job actually pulls and runs that image. - For Process workers, install the deps where the worker runs:
Copy code
pip install "snowflake-connector-python[pandas]" pandas pyarrow
3) Small but important Dockerfile notes - You’re installing both
snowflake-connector-python[pandas]
and
snowflake
. The PyPI package named
snowflake
is not the Snowflake connector and can cause module conflicts. Remove
snowflake
from your pip line. - If you use DataFrame helpers (e.g., write_pandas, fetch_pandas_all), you often need pyarrow too. Add it explicitly:
Copy code
RUN pip install --no-cache-dir "snowflake-connector-python[pandas]" pandas pyarrow adlfs s3fs requests pyodbc prefect-azure && \
      apt-get update && apt-get install -y unixodbc
Note: In Prefect 3.x the Azure collection is published as
prefect-azure
(hyphen), not
prefect_azure
. - Quick sanity test your image locally:
Copy code
docker run --rm -it your-registry/your-image:tag \
    python -c "import pandas, snowflake.connector; print('OK', pandas.__version__, snowflake.connector.__version__)"
Checklist to get you unblocked - Confirm the image used by your failing run in the UI. - If incorrect, update your deployment or pool to use your custom image and ensure the worker is running for that pool. - Rebuild/tag/push your image, then redeploy:
Copy code
docker build -t your-registry/your-image:tag .
  docker push your-registry/your-image:tag
- Remove
snowflake
from your pip installs; keep
snowflake-connector-python[pandas]
(+ pyarrow). - If not using a Docker work pool, install the deps in the actual runtime environment. If you can share: - Your work pool type (Docker, Process, K8s, etc.) - How you’re creating the deployment (prefect.yaml or code/CLI) and which image it references - A link or screenshot of the flow run’s Infrastructure panel showing the image …I can pinpoint the exact change needed.
c
@Marvin Our work pool is on a Kubernetes Cluster. The deployment is created using this configuration in Prefect.yaml
Copy code
- name: VenueNextColumns
  version:
  tags: []
  description:
  entrypoint: venueNextParseColumns_snowflake.py:main_flow
  parameters: {}
  work_pool:
    name: venuenext-kubernetes
    worker_name: KubernetesWorker ce78e435-8a1e-4950-8d8b-e4ee47546859
    job_variables:
      image: '{{ build-image.image }}'
  schedules: []
  push:
  - prefect_azure.deployments.steps.push_to_azure_blob_storage:
      container: prefectscripts
      folder: ''
      credentials: '{{ prefect.blocks.azure-blob-storage-credentials.prefectscriptskubernetes
        }}'
  pull:
  - prefect_azure.deployments.steps.pull_from_azure_blob_storage:
      container: prefectscripts
      folder: ''
      credentials: '{{ prefect.blocks.azure-blob-storage-credentials.prefectscriptskubernetes
        }}'
  concurrency_limit:
m
thought for 221.1 seconds
Thanks — with a Kubernetes work pool and this prefect.yaml, the key issue is that your deployment’s
job_variables.image
points to
{{ build-image.image }}
, but there is no step in your prefect.yaml that actually builds/pushes an image and sets that variable. As a result, the Kubernetes job is likely using the work pool’s default image (a base Prefect image that does not include pandas), which causes the Snowflake “pandas not installed” error. Two ways to fix it: Option A: Hardcode your custom image tag in the deployment - Build and push your image to a registry your cluster can pull from. - Set
job_variables.image
to that exact tag. Example:
Copy code
work_pool:
  name: venuenext-kubernetes
  job_variables:
    image: "your-registry/your-image:tag"
    # if your registry is private:
    # image_pull_secrets:
    #   - name: your-registry-secret
Option B: Wire up docker build/push steps in prefect.yaml, then keep using {{ build-image.image }} - Add build/push steps using the prefect-docker collection so
build-image.image
is produced:
Copy code
build:
- prefect_docker.deployments.steps.build_docker_image:
    id: build-image
    requires: prefect-docker
    image_name: your-registry/your-image
    tag: v1
    dockerfile: ./Dockerfile
- prefect_docker.deployments.steps.push_docker_image:
    id: push-image
    requires: prefect-docker
    image: '{{ build-image.image }}'
- Keep your deployment’s
job_variables.image: '{{ build-image.image }}'
. Make sure the image actually has the Snowflake pandas extras Your Dockerfile is close, but I recommend these changes: - Remove
snowflake
(that’s not the connector and can cause conflicts) - Install the Azure collection with the correct PyPI name
prefect-azure
- Add
pyarrow
(commonly needed with Snowflake pandas helpers) Example:
Copy code
FROM prefecthq/prefect:3.2.14-python3.9
COPY . /opt/prefect/Scripts/
WORKDIR /opt/prefect/Scripts/
RUN pip install --no-cache-dir \
      "snowflake-connector-python[pandas]" \
      pandas \
      pyarrow \
      adlfs \
      s3fs \
      requests \
      pyodbc \
      prefect-azure \
  && apt-get update \
  && apt-get install -y --no-install-recommends unixodbc \
  && rm -rf /var/lib/apt/lists/*
Verify what image was used by the failing run - In the Prefect UI: Flow run → Infrastructure panel → check the “image” value. If it’s not your custom image, the deployment wasn’t pointing at it. Helpful docs - Kubernetes work pools and job variables: Run on Kubernetes - Customize job variables: Customize job variables If you share: - The image value shown on the failing run’s Infrastructure panel - Whether your cluster needs
imagePullSecrets
- Your intended image tag/registry …I can provide the exact prefect.yaml snippet you need.
c
@Marvin I checked the Kubernetes cluster and it is pulling the correct image from Docker. Any other ideas?
m
thought for 64.0 seconds
Got it — if the cluster is definitely pulling your custom image, there are a few other common causes for this exact “pandas not installed” error: 1) The pod is running an older cached image tag - Even if it “pulled,” Kubernetes may be reusing a local cached image with the same tag. - Fix: use unique tags per build (e.g., include a sha or timestamp) and/or set the image pull policy to Always:
Copy code
work_pool:
  name: venuenext-kubernetes
  job_variables:
    image: "your-registry/your-image:2024-10-14-001"  # new tag
    image_pull_policy: "Always"
Docs: Run on Kubernetes 2) Package conflict: the
snowflake
PyPI package is installed - Your original Dockerfile installs
snowflake
, which is not the Snowflake connector and can interfere with imports. - Rebuild your image without
snowflake
and make sure you’re using
prefect-azure
(hyphen), not
prefect_azure
(underscore) as the PyPI package name. - Also add
pyarrow
, which Snowflake pandas helpers typically require. Example Dockerfile:
Copy code
FROM prefecthq/prefect:3.2.14-python3.9
COPY . /opt/prefect/Scripts/
WORKDIR /opt/prefect/Scripts/

# Use the same interpreter the container will run with
RUN python -m pip install --no-cache-dir \
      "snowflake-connector-python[pandas]" \
      pandas \
      pyarrow \
      adlfs \
      s3fs \
      requests \
      pyodbc \
      prefect-azure \
  && apt-get update \
  && apt-get install -y --no-install-recommends unixodbc \
  && rm -rf /var/lib/apt/lists/*
3) Packages installed into a different Python than the one used at runtime - Using
pip
vs
python -m pip
can sometimes install into the wrong site-packages if multiple interpreters/venvs exist. - You can sanity-check during build:
Copy code
RUN python - <<'PY'
import sys
print("Python exe:", sys.executable)
import pandas, snowflake.connector
print("pandas:", pandas.__version__)
print("snowflake:", snowflake.connector.__version__)
PY
4) The Snowflake call needs pandas/pyarrow, but import fails for a different reason - The Snowflake connector raises the exact “Optional dependency: 'pandas' is not installed” message whenever importing pandas fails — that includes missing wheels or incompatible binaries. - Verifying inside the running pod will tell us quickly:
Copy code
# Grab the failing flow run's pod name, then:
kubectl exec -it <pod-name> -- python -c "import pandas, snowflake.connector; print('OK', pandas.__version__, snowflake.connector.__version__)"
If that import fails, it’s definitely an environment issue (wrong tag, wrong Python path, or the conflict from
snowflake
package). 5) Add a quick runtime check in your flow to confirm the environment At the start of the flow:
Copy code
from prefect import flow, get_run_logger

@flow
def main_flow(...):
    logger = get_run_logger()
    try:
      import sys, pandas, snowflake.connector
      <http://logger.info|logger.info>(f"Python: {sys.executable}")
      <http://logger.info|logger.info>(f"pandas: {pandas.__version__}")
      <http://logger.info|logger.info>(f"snowflake: {snowflake.connector.__version__}")
    except Exception as e:
      logger.exception(f"Import check failed: {e}")
    # ...rest of flow...
Quick checklist - Rebuild with the corrected Dockerfile (no
snowflake
, use
prefect-azure
, add
pyarrow
, use
python -m pip
). - Push with a new, unique tag; set
image_pull_policy: Always
. - Redeploy and run again. - If it still fails, exec into the pod and run the one-liner import test above — that will pinpoint whether it’s the image or the application code. If you can share: - The exact image tag and whether
image_pull_policy
is set - Output of the in-pod
python -c "import pandas, snowflake.connector; ..."
test - The specific Snowflake API you’re calling (e.g.,
fetch_pandas_all
,
write_pandas
, Snowpark
to_pandas()
) …I can zero in on the remaining cause.
c
@Marvin I analyzed the packages in my docker file and I have these packages installed:
Copy code
snowflake-connector-python=4.0.0
pandas=2.3.3
pyarrow=21.0.0
However, I am still getting the error:
Copy code
snowflake.connector.errors.ProgrammingError: 255002: Optional dependency: 'pandas' is not installed, please see the following link for install instructions: <https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#installation>
Any ideas?
m
thought for 73.5 seconds
Thanks for the details — given those versions are in your image, Snowflake only throws 255002 when
import pandas
fails at runtime. So we need to find why pandas import is failing inside the pod. Fast way to pinpoint the cause - Exec into a failing flow pod and run this:
Copy code
kubectl exec -it <pod-name> -- sh -lc 'python - <<PY
import sys, importlib
print("Python:", sys.version, sys.executable)
for name in ("pandas", "numpy", "pyarrow", "snowflake.connector"):
    try:
        m = importlib.import_module(name)
        path = getattr(m, "__file__", None)
        ver = getattr(m, "__version__", "unknown")
        print(f"{name} OK {ver} at {path}")
    except Exception as e:
        print(f"{name} FAILED: {type(e).__name__}: {e}")
PY'
- This will tell us exactly which import fails and why (e.g., version mismatch, missing binary, shadowed module, etc.). Please paste the output here. Common root causes I’ve seen for this exact symptom 1) Local module shadowing pandas - A file/folder in your project named
pandas.py
or
pandas/
can hijack the import. - Check what’s being imported: the command above prints the module path. If it points into your project instead of site-packages, rename/remove that file. - You can also log it in your flow:
Copy code
import pandas, sys
print("pandas file:", getattr(pandas, "__file__", "n/a"))
2) Numpy/pandas ABI mismatch - Pandas 2.3.x requires NumPy 2.x. If an older NumPy is present,
import pandas
will fail with an ImportError that Snowflake reports as “pandas not installed”. - Ensure NumPy 2.x is installed in the runtime:
Copy code
kubectl exec -it <pod> -- python -c "import numpy; import pandas; print(numpy.__version__, pandas.__version__)"
- If this fails, explicitly pin compatible versions in your image:
Copy code
RUN python -m pip install --no-cache-dir \
    "numpy>=2.0,<3" \
    "pandas==2.3.3" \
    "pyarrow==21.0.0" \
    "snowflake-connector-python[pandas]==4.0.0"
3) Wrong interpreter/site-packages at runtime - Very rarely, packages are installed with one interpreter but the job runs with another. - Your base image is fine, but to be safe, use
python -m pip
in the Dockerfile (you did list versions; confirming with the exec test above will rule this out). - Also ensure no venv is being activated in your container entrypoint that hides system site-packages. 4) Cached or stale image tag - Even if the cluster “pulled,” it may still run a cached layer without the latest wheels. - Use a unique image tag per build and set:
Copy code
work_pool:
  job_variables:
    image_pull_policy: "Always"
5) Missing system libs for pandas/pyarrow wheels - Uncommon on the Prefect base, but if import errors mention GLIBC/libstdc++ etc., we’ll see it in the exec output. We can add the needed apt packages if so. Two extra sanity checks - At the very top of your flow, log imports to the Prefect UI so you don’t need pod access:
Copy code
from prefect import flow, get_run_logger
@flow
def main_flow():
    logger = get_run_logger()
    try:
        import sys, pandas, numpy, pyarrow, snowflake.connector
        <http://logger.info|logger.info>(f"Python: {sys.executable}")
        <http://logger.info|logger.info>(f"pandas: {pandas.__version__} ({getattr(pandas, '__file__', 'n/a')})")
        <http://logger.info|logger.info>(f"numpy: {numpy.__version__}")
        <http://logger.info|logger.info>(f"pyarrow: {pyarrow.__version__}")
        <http://logger.info|logger.info>(f"snowflake: {snowflake.connector.__version__}")
    except Exception as e:
        logger.exception(f"Import check failed: {e}")
    # rest of your flow...
- Confirm there’s no
snowflake
PyPI package installed (only
snowflake-connector-python
). The wrong
snowflake
package can cause weird import behavior:
Copy code
kubectl exec -it <pod> -- python -c "import pkgutil; print(any(m.name=='snowflake' for m in pkgutil.iter_modules()))"
If you can share the output of the in-pod import test (especially the failure message and module paths), I can give you the precise fix. My hunch is either a numpy/pandas mismatch or a local
pandas.py
shadowing the import.
c
@Marvin I ran the script attached in a failed pod:
Copy code
import sys, importlib
print("Python:", sys.version, sys.executable)
for name in ("pandas", "numpy", "pyarrow", "snowflake.connector"):
    try:
        m = importlib.import_module(name)
        path = getattr(m, "__file__", None)
        ver = getattr(m, "__version__", "unknown")
        print(f"{name} OK {ver} at {path}")
    except Exception as e:
        print(f"{name} FAILED: {type(e).__name__}: {e}")
PY'
The printed statements I got were:
Copy code
pandas OK 2.2.3 at /usr/local/lib/python3.9/site-packages/pandas/__init__.py
numpy OK 2.0.2 at /usr/local/lib/python3.9/site-packages/numpy/__init__.py
pyarrow FAILED: ModuleNotFoundError: No module named 'pyarrow'
snowflake.connector OK 3.14.0 at /usr/local/lib/python3.9/site-packages/snowflake/connector/__init__.py
Could you help me interpret this print screen?
m
thought for 73.9 seconds
Thanks for running that check — it tells us a lot. What your pod shows - pandas OK 2.2.3 - numpy OK 2.0.2 - pyarrow FAILED (not installed) - snowflake-connector-python is 3.14.0 (not 4.0.0) Interpretation - Your flow’s container can import pandas just fine. However, pyarrow is missing in the runtime environment. - Many Snowflake “pandas” helpers (e.g., fetch_pandas_all, write_pandas, Snowpark to_pandas) require both pandas and pyarrow. In connector 3.x, a missing optional dep sometimes bubbles up as the generic “Optional dependency: 'pandas' is not installed” error, even when pyarrow is the missing piece. So this output explains the 255002 error. - The versions in the pod do not match what you expected (pandas 2.3.3 / connector 4.0.0 / pyarrow 21.0.0). This strongly suggests the image actually running does not include those pins, or a later layer/step in your build or runtime is installing different versions. How to fix quickly - Ensure pyarrow is in the image used by the Kubernetes job. - Pin the versions you want and build/push a new, uniquely tagged image. Set image_pull_policy: Always to avoid a cached pull. Dockerfile example (pin and install with the same interpreter)
Copy code
FROM prefecthq/prefect:3.2.14-python3.9

WORKDIR /opt/prefect/Scripts/
COPY . /opt/prefect/Scripts/

RUN python -m pip install --no-cache-dir \
      "snowflake-connector-python[pandas]==4.0.0" \
      "pandas==2.3.3" \
      "numpy>=2.0,<3" \
      "pyarrow==21.0.0" \
      adlfs s3fs requests pyodbc prefect-azure \
  && apt-get update \
  && apt-get install -y --no-install-recommends unixodbc \
  && rm -rf /var/lib/apt/lists/*
Then: - Build/push with a unique tag and set image pull policy:
Copy code
# build/push
docker build -t your-registry/your-image:2024-10-14-1 .
docker push your-registry/your-image:2024-10-14-1
- In your prefect.yaml deployment:
Copy code
work_pool:
  name: venuenext-kubernetes
  job_variables:
    image: "your-registry/your-image:2024-10-14-1"
    image_pull_policy: "Always"
- Redeploy and run again. Double-check in the pod after redeploy
Copy code
kubectl exec -it <new-pod> -- sh -lc 'python - <<PY
import importlib
for name in ("pandas","numpy","pyarrow","snowflake.connector"):
    try:
        m = importlib.import_module(name)
        print(name, "OK", getattr(m, "__version__", "unknown"))
    except Exception as e:
        print(name, "FAILED:", e)
PY'
If you need to confirm the current pod’s state right now (without rebuild), you can also test the hypothesis by installing pyarrow in-place:
Copy code
kubectl exec -it <pod> -- sh -lc 'python -m pip install --no-cache-dir pyarrow && python -c "import pyarrow; print(pyarrow.__version__)"'
If your flow then proceeds past the error, that confirms pyarrow was the missing dependency. You’ll still want to bake it into the image and redeploy with a unique tag. Given your pod currently shows connector 3.14.0, if you don’t want to bump to 4.0.0 yet, at least add pyarrow to the existing image and re-tag:
Copy code
python -m pip install "snowflake-connector-python[pandas]==3.14.0" pyarrow
Most likely causes for the mismatch you saw - The image used by the job was not the one you expected (stale tag/cache). Use a unique tag and image_pull_policy: Always. - Another install step (requirements.txt, a runtime init step, or a layer in your Dockerfile) installed/overrode the connector to 3.14.0 and did not include pyarrow. - Installing with plain pip vs python -m pip into a different interpreter/site-packages. Once pyarrow is present and versions are aligned, this specific 255002 should go away. If it persists after you confirm pyarrow imports inside the pod, share the new import test output and the exact Snowflake call you’re making (e.g., fetch_pandas_all, write_pandas, etc.) and we’ll dig further.
c
@Marvin Is there a way to print the name of the Docker image my code is using in the code?
m
thought for 54.5 seconds
Yes. In Prefect 3.x you can read the resolved job variables for the current flow run — for Kubernetes that typically includes the image. Add this near the start of your flow:
Copy code
from prefect import flow, get_run_logger
from prefect.runtime import flow_run

@flow
def main_flow():
    logger = get_run_logger()
    jv = getattr(flow_run, "job_variables", None) or {}
    image = jv.get("image")
    <http://logger.info|logger.info>(f"Kubernetes job_variables: {jv}")
    <http://logger.info|logger.info>(f"Container image: {image}")
    # ...rest of your flow...
Notes: -
flow_run.job_variables
is populated when the flow runs from a deployment. If you run locally (e.g., .serve()), it may be empty or missing. If you want to query Kubernetes directly (e.g., to double-check what actually ran), you can read the current Pod spec using the in-cluster Kubernetes client. This requires your service account to have permission to read its own Pod:
Copy code
import os
from kubernetes import client, config
from prefect import get_run_logger

def log_pod_images():
    logger = get_run_logger()
    config.load_incluster_config()
    pod_name = os.environ.get("HOSTNAME")  # default to current pod hostname
    with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace") as f:
        namespace = f.read().strip()
    v1 = client.CoreV1Api()
    pod = v1.read_namespaced_pod(name=pod_name, namespace=namespace)
    images = [c.image for c in pod.spec.containers]
    <http://logger.info|logger.info>(f"K8s pod containers: {images}")
Docs: - Kubernetes work pools and job variables: Run on Kubernetes - Customize job variables: Customize job variables
c
@Marvin Running that code returned the following print lines
Copy code
Kubernetes job_variables: {}
Container image: None
Does this mean my flow is not grabbing the correct docker image? How can I update my code to ensure it grabs the correct docker image?
m
thought for 64.5 seconds
Great question. The empty job_variables means your deployment did not set any per-deployment overrides. That does not mean Kubernetes used the wrong image — it likely means the image is coming from your Kubernetes work pool’s default job template instead of the deployment’s job_variables. Key points: - flow_run.job_variables only shows what the deployment overrides. If your image is defined on the work pool’s base job template, job_variables will be empty in code. - To confirm the actual image at runtime, query the current Pod via the Kubernetes API from inside the pod (this reads the real spec). Add this to your flow to log the real container image(s):
Copy code
# pip install kubernetes in your image if you don't have it
# RUN python -m pip install kubernetes

from prefect import flow, get_run_logger
import os

def log_pod_images():
    from kubernetes import client, config
    logger = get_run_logger()
    try:
        config.load_incluster_config()
        with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace") as f:
            namespace = f.read().strip()
        pod_name = os.environ.get("HOSTNAME")
        v1 = client.CoreV1Api()
        pod = v1.read_namespaced_pod(name=pod_name, namespace=namespace)
        images = [c.image for c in pod.spec.containers]
        <http://logger.info|logger.info>(f"Pod: {namespace}/{pod_name} images: {images}")
    except Exception as e:
        logger.exception(f"Failed to query K8s pod image: {e}")

@flow
def main_flow():
    logger = get_run_logger()
    # This prints deployment overrides (empty if image comes from pool defaults)
    from prefect.runtime import flow_run
    <http://logger.info|logger.info>(f"Kubernetes job_variables: {getattr(flow_run, 'job_variables', {})}")
    # This prints the actual pod container image(s)
    log_pod_images()
    # ...rest of your flow...
How to “ensure it grabs the correct docker image” You have two reliable options: - Set the image on the deployment: - In your prefect.yaml under this deployment, set:
Copy code
work_pool:
      name: venuenext-kubernetes
      job_variables:
        image: "your-registry/your-image:unique-tag"
        image_pull_policy: "Always"
- Use a unique tag per build to avoid stale caches. - Or set the default image on the work pool: - Prefect UI → Work Pools → venuenext-kubernetes → Edit default job template/variables → set image and imagePullPolicy. - Or inspect/update via CLI: - Inspect current defaults:
Copy code
prefect work-pool inspect "venuenext-kubernetes"
- Update by supplying a base job template JSON if needed. Docs: - Customize job variables - Run on Kubernetes Once you set the image in either place and redeploy, the Kubernetes API snippet above will confirm the exact image used by the pod.
c
@Marvin How do I deploy a flow without pushing my entire directory into an Azure blob storage?
m
thought for 295.5 seconds
You have a few good options in Prefect 3.x to avoid uploading your whole project to Azure Blob: Option 1: Bake your code into the Docker image (no storage at all) - Copy only the needed files into your image and run from there. Remove your Azure push/pull steps. - Example prefect.yaml snippet:
Copy code
deployments:
- name: VenueNextColumns
  entrypoint: venueNextParseColumns_snowflake.py:main_flow  # path inside the container
  work_pool:
    name: venuenext-kubernetes
    job_variables:
      image: your-registry/your-image:unique-tag
      image_pull_policy: Always
  push: []
  pull: []
- Ensure your Dockerfile copies just the flow code you need, and the path matches the entrypoint. - Docs: Run on Kubernetes Option 2: Pull code from Git at runtime (no Azure storage) - Replace your Azure push/pull steps with a Git clone pull step so workers fetch the code directly.
Copy code
deployments:
- name: VenueNextColumns
  entrypoint: venueNextParseColumns_snowflake.py:main_flow
  work_pool:
    name: venuenext-kubernetes
    job_variables:
      image: your-registry/your-image:unique-tag
      image_pull_policy: Always
  push: []
  pull:
  - prefect.deployments.steps.git_clone:
      repository: <https://github.com/your-org/your-repo.git>
      branch: main
      # credentials: '{{ prefect.blocks.secret.github-token }}'  # if private
- Docs: Store flow code, prefect.yaml Option 3: Keep Azure, but ignore almost everything - Your current step supports an ignore file. Create a
.prefectignore
that excludes the whole directory except the files you need.
Copy code
# .prefectignore
*
!venueNextParseColumns_snowflake.py
!your_package/**
!.prefectignore
- Or point the step at a custom ignore file:
Copy code
push:
- prefect_azure.deployments.steps.push_to_azure_blob_storage:
    container: prefectscripts
    folder: ''
    credentials: '{{ prefect.blocks.azure-blob-storage-credentials.prefectscriptskubernetes }}'
    ignore_file: .prefectignore
- Step signature reference: it accepts
ignore_file
and defaults to
.prefectignore
. Which should you pick? - If you already run on Kubernetes with a custom image, Option 1 is simplest and fastest at runtime. - If you keep your code in Git and want light images, Option 2 works great. - If you must use Azure storage but want smaller uploads, Option 3. If you tell me which route you prefer, I can draft the exact prefect.yaml and Dockerfile changes for your project. Links for reference: - Store flow code - prefect.yaml - Run on Kubernetes
c
@Marvin I deployed my updated code and received the following error:
Copy code
AttributeError: module 'botocore' has no attribute 'config'
m
thought for 66.9 seconds
That error isn’t Prefect-specific — it means Python is not finding the expected submodule at botocore.config. Most often, it’s one of these: Likely causes 1) Wrong import pattern - This will fail:
Copy code
import botocore
cfg = botocore.config.Config(...)
- Use either of these instead:
Copy code
from botocore.config import Config
# or
import botocore.config as bc_config
cfg = bc_config.Config(...)
2) Local module shadowing the real botocore - If your project has a file/folder named
botocore.py
or
botocore/
, it will shadow the installed AWS library. - Quick check (run inside the failing pod or log in your flow):
Copy code
import botocore, sys
print("botocore imported from:", getattr(botocore, "__file__", "builtin"))
print("sys.path:", sys.path)
If the path points into your project instead of site-packages, rename/remove that file/folder. 3) Version mismatch between boto3, botocore, aiobotocore/s3fs - s3fs pulls aiobotocore which pins specific botocore versions. If versions drift, odd import errors can show up. - Check versions:
Copy code
import boto3, botocore, pkgutil
print("boto3:", boto3.__version__)
print("botocore:", botocore.__version__)
print("aiobotocore installed:", any(m.name=='aiobotocore' for m in pkgutil.iter_modules()))
- If mismatched, pin compatible versions and rebuild with a new image tag. For example:
Copy code
python -m pip install --no-cache-dir \
  "boto3>=1.34,<1.35" \
  "botocore>=1.34,<1.35" \
  "s3fs>=2024.5.0"
4) Submodule not imported before attribute access - Accessing
botocore.config
only works after the submodule is imported. The from-import forms above ensure that. Quick next steps - Share the failing line(s) of code that reference botocore so I can suggest the exact fix. - Run this small diagnostic in the pod and paste output:
Copy code
python - <<'PY'
import sys, importlib, pkgutil
def info(name):
    try:
        m = importlib.import_module(name)
        print(f"{name} OK version={getattr(m,'__version__','?')} path={getattr(m,'__file__','builtin')}")
    except Exception as e:
        print(f"{name} FAIL {type(e).__name__}: {e}")

info("boto3")
info("botocore")
try:
    from botocore.config import Config
    print("from botocore.config import Config OK")
except Exception as e:
    print("botocore.config import FAIL:", e)

print("aiobotocore present:", any(m.name=='aiobotocore' for m in pkgutil.iter_modules()))
PY
If this is coming from s3fs usage, we can align versions or switch to the recommended import pattern and it should clear up quickly.