[Prefect 2.0.4 + Prefect Cloud] Hey, I'm strugglin...
# ask-community
a
[Prefect 2.0.4 + Prefect Cloud] Hey, I'm struggling with the deployment override api and kubernetes-job infrastructure I've created a generic kubernetes-job infrastructure block with some sane defaults for image etc. but no env variables. If I run this block without overrides, everything is fine and the flow starts as expected.
Copy code
prefect deployment build r_script_automation.py:r_script_automation --name retention_cohort_analysis_deployment -t k8s -sb gcs/gcs-prefect-stprage -ib kubernetes-job/generic-k8s-job -o cohort_deployment.yaml
However, as soon as I add infrastr. overrides, the flow stays in "pending" state forever
Copy code
prefect deployment build r_script_automation.py:r_script_automation --name retention_cohort_analysis_deployment -t k8s -sb gcs/gcs-prefect-stprage -ib kubernetes-job/generic-k8s-job -o cohort_deployment.yaml --override image=europe-docker.pkg.dev/vol-at/rm-datateam-repository/r-script-automation:beta-23 --override env.GIT_PYTHON_REFRESH="quiet" --override env.GOOGLE_APPLICATION_CREDENTIALS="/google/.google-secret-key.json" --override env.CHROMIUM_FLAGS="--no-sandbox"
If I manually delete the infra_overrides-block from the deployment-yaml - the flow works again. So somehow this env.xyz - overrides are the problem. Can you point me to my mistake? Edit: I tested overriding the "image" - this works. It really only fails when overriding the environment variables....
j
What happens if you try only this environment variable override?
--override env.PREFECT_LOGGING_LEVEL=DEBUG
a
It's the same result
If I manually remove this env.xyz-line from the yaml-file, the flow gets picked up.
j
hmm. I am on Prefect 2.0.4 and canโ€™t reproduce. The flow runs get picked up on local and cloud with
Copy code
infra_overrides:
env.PREFECT_LOGGING_LEVEL: DEBUG
a
Hmm.. ok, thanks for trying. You also tried to reproduce with a kubernetes-job, right? If so, I'll try to figure it out myself by creating a minimal example and see where my mistake is hidden. Thanks again ๐Ÿ‘ If I find the mistake, I'll update here - maybe it's relevant for anybody else.
@Jeff Hale Maybe there is something were I could need your input again. I just found, that my kubernetes agents are indeed picking up the flow - but the agent crashes with the following Exception. The "keyerror env" seems suspicious. Is this something which might hint to the problem? I just double-checked, my agent runs version 2.0.4
Copy code
12:30:17.711 | INFO    | prefect.agent - Submitting flow run 'bb691907-34d2-470e-ac38-85e8ae00c302'
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 193, in wrapper
    return run_async_in_new_loop(async_fn, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 140, in run_async_in_new_loop
    return anyio.run(partial(__fn, *args, **kwargs))
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/agent.py", line 104, in start
    await critical_service_loop(
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 271, in __aexit__
    await self.shutdown(*exc_info)
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 260, in shutdown
    await self.task_group.__aexit__(*exc_info)
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 182, in submit_run
    infrastructure = await self.get_infrastructure(flow_run)
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 163, in get_infrastructure
    data = data[field]
KeyError: 'env'
Ok I found another puzzle piece. If I run the exact same command with an infrastructure block (so "-ib kubernetes-job/my-existing-block"), I get the above exception. (Note that the infrastructure block which I created never hat env configured. Maybe this is the issue?) Nevertheless, if i run the build command with "-i kubernetes-job" instead of -ib, the overrides work for me. Sorry, for my long monologue, maybe this really is an issue somewhere and it helps to resolve it. If it's not an issue - I also found a working solution for me ๐Ÿ˜„ Thanks for your assistance.
๐Ÿ™Œ 1
Ah indeed. If I initialize the kubernetes-job block in the UI with an empty JSON ({}), it works! Then I can override the env-variables. (Maybe it's possible to add this to the documentation - that one needs to have default env object declared to override it. Or have the kubernetes-job block an env-object bei default - but that's obviously not up to me).