Hello, I am running a flow in kubernetes (2.10.10)...
# ask-community
p
Hello, I am running a flow in kubernetes (2.10.10) and for some reason, when the flow run fails, it gets automatically retried creating multiple jobs before the flow run is finally marked as Failed. What could be causing this behaviour? I did not set any retry logic. In 2.9.0 In 2.9, I get two k8s jobs, the first one fails and then a second run is executed but it finishes right away with message
Engine execution of flow run '2c1f694b-3ca7-45f4-8048-908838963b52' aborted by orchestrator: This run has already terminated.
Any ideas? thanks!
Why are multiple jobs being created? Do you mean multiple runs of a single job?
p
I see different kubernetes job for the same flow run. The job fails and immediately after another one starts for the same flow
all job logs show something like this
Copy code
02:33:48.151 | ERROR   | Flow run 'hypersonic-pony' - Finished in state Failed('Flow run encountered an exception. requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: <http://sc-solrcloud-headless:8983/solr/c922afeb-a318-4b8f-a29d-46bd11a6552d/update?commit=true&wt=json>\n')
02:33:48.183 | ERROR   | prefect.engine - Engine execution of flow run '64bb600e-bb7a-4e43-bd95-d07ed4fb8a56' exited with unexpected exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine.py", line 2139, in <module>
    enter_flow_run_engine_from_subprocess(flow_run_id)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine.py", line 200, in enter_flow_run_engine_from_subprocess
    return from_sync.wait_for_call_in_loop_thread(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/schemas.py", line 107, in result
    return get_state_result(self, raise_on_failure=raise_on_failure, fetch=fetch)
  File "/usr/local/lib/python3.8/site-packages/prefect/states.py", line 76, in get_state_result
    return _get_state_result(state, raise_on_failure=raise_on_failure)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 260, in coroutine_wrapper
    return call()
  File "/usr/local/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 245, in __call__
    return self.result()
  File "/usr/local/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 173, in result
    return self.future.result(timeout=timeout)
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 218, in _run_async
    result = await coro
  File "/usr/local/lib/python3.8/site-packages/prefect/states.py", line 91, in _get_state_result
    raise await get_state_exception(state)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine.py", line 674, in orchestrate_flow_run
    result = await flow_call.aresult()
  File "/usr/local/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 181, in aresult
    return await asyncio.wrap_future(self.future)
  File "/usr/local/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 194, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
I have noticed that flows are run 7 times
and I see this in the job manifest. I am using prefect default job manifest
Copy code
spec:
  parallelism: 1
  completions: 1
  backoffLimit: 6
z
6 is the default
if you specify 0 manually it will override the Kubernetes default
You can also just upgrade your flow run containers to a newer version then it won’t matter.
This is happening because you’re running a container < 2.10.3 or so
p
if you specify 0 manually
where can I specify it? yeah, my container is on 2.9. when I upgrade, what will the default value be?
z
The default value will not change if you upgrade, but we won’t exit with a non-zero code so Kubernetes won’t retry it
You would need to specify 0 via a
customization
It’s truly just easier to upgrade though
p
great, thanks a lot for your help!