Tedi Gjoni
04/14/2024, 6:03 PMretries
and retry_delay_seconds
params in the task declaration, does the Timeout change? Im having a Timeout error and i increased the above parameters which resulted on the task to fail sooner. Is there any relationship between timeout_seconds
and the other variables? As i increase retires, i have to increase timeout_seconds
as well?Tedi Gjoni
04/14/2024, 6:05 PMNate
04/14/2024, 6:10 PMtimeout_seconds
should only enforce timeouts for the execution of a single task / flow run, which should be entirely independent of retries
and retry_delay_seconds
Nate
04/14/2024, 6:11 PMTedi Gjoni
04/14/2024, 6:12 PMretry_delay_seconds
> `timeout_seconds`the task will crash/timeout?Tedi Gjoni
04/14/2024, 6:15 PMtimeout_seconds
it raise Timeout.Nate
04/14/2024, 6:17 PMretry_delay_seconds
> `timeout_seconds`the task will crash/timeout?
retry_delay_seconds
is how long we will wait between retries, which should have nothing to do with how long a task / flow will be allowed to run before failing with a TimeoutError
according to timeout_seconds
for example
from time import sleep
from prefect import flow
@flow(retries=1, retry_delay_seconds=10, timeout_seconds=3)
def sleepy():
sleep(1e3)
if __name__ == "__main__":
sleepy()
this will timeout after 3 seconds, enter AwaitingRetry
, wait 10 seconds according to retry_delay_seconds
, retry because retries=1
and then fail again after 3 seconds and finally enter Failed
Tedi Gjoni
04/14/2024, 6:29 PMtimeout_seconds
Tedi Gjoni
04/14/2024, 6:30 PMtimeout_seonds
whats the default number?Nate
04/14/2024, 6:34 PMTedi Gjoni
04/14/2024, 6:36 PMNate
04/14/2024, 6:38 PMis there any relationship between the env variables and the timeout_seconds?no - those are unrelated server-side settings. if you're not setting a
timeout_seconds
and still getting a TimeoutError
, i would suspect the code you have inside your task/flow is raising that, not prefect.
without seeing the code its hard to give a useful guessTedi Gjoni
04/14/2024, 6:40 PMTedi Gjoni
04/14/2024, 6:40 PMtimeout_seconds
as im currently running a jobTedi Gjoni
04/14/2024, 6:44 PMTedi Gjoni
04/14/2024, 6:45 PMNate
04/14/2024, 6:47 PMpip freeze | grep -E 'prefect|ray'
Tedi Gjoni
04/14/2024, 6:47 PMprefect==2.14.13
prefect-email==0.2.2
prefect-ray==0.2.5
ray==2.8.1
Tedi Gjoni
04/14/2024, 6:49 PMn_jobs
to test if it solves the issue. Some runs it works, some runs it doesnt. A weird behavior i have noticed is that we this job runs (which big) and i open Prefect UI, the job crashesNate
04/14/2024, 6:50 PMTedi Gjoni
04/14/2024, 6:53 PMTedi Gjoni
04/14/2024, 6:55 PMNate
04/14/2024, 6:55 PMget_run_logger
will get sent to the API and stored in the dbTedi Gjoni
04/14/2024, 6:57 PMTedi Gjoni
04/14/2024, 6:58 PMNate
04/14/2024, 7:03 PMRayTaskRunner
, timeout_seconds
is not being correctly enforced, so I've opened an issue about thatTedi Gjoni
04/14/2024, 7:05 PMNate
04/14/2024, 7:06 PMTedi Gjoni
04/14/2024, 7:06 PMTedi Gjoni
04/14/2024, 7:06 PMNate
04/14/2024, 7:06 PMTedi Gjoni
04/14/2024, 7:48 PMTedi Gjoni
04/14/2024, 7:49 PMTedi Gjoni
04/14/2024, 9:16 PMNate
04/14/2024, 9:18 PMTedi Gjoni
04/14/2024, 9:18 PMTraceback (most recent call last):
File "/home/tgjoni/alpharoc/occam/bin/flows/signal-process.py", line 173, in signal_process_flow
signal_temporary.load_to_dynamodb(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/flows.py", line 1120, in __call__
return enter_flow_run_engine_from_flow_call(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 291, in enter_flow_run_engine_from_flow_call
retval = from_sync.wait_for_call_in_loop_thread(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 243, in wait_for_call_in_loop_thread
return call.result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 284, in result
return self.future.result(timeout=timeout)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 168, in result
return self.__get_result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 355, in _run_async
result = await coro
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
return await fn(*args, **kwargs)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 733, in create_and_begin_subflow_run
return await terminal_state.result(fetch=True)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/states.py", line 91, in _get_state_result
raise await get_state_exception(state)
TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 849, in orchestrate_flow_run
result = await flow_call.aresult()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 293, in aresult
return await asyncio.wrap_future(self.future)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 318, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/home/tgjoni/alpharoc/occam/bin/flows/signal-process.py", line 211, in signal_process_flow
flow_state = Failed(message=error)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/server/schemas/states.py", line 331, in Failed
return cls(type=StateType.FAILED, **kwargs)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for State
message
str type expected (type=type_error.str)
21:07:22.658 | ERROR | Flow run 'notorious-kestrel' - Finished in state Failed('Flow run encountered an exception. ValidationError: 1 validation error for State\nmessage\n str type expected (type=type_error.str)')
Traceback (most recent call last):
File "/home/tgjoni/alpharoc/occam/bin/flows/signal-process.py", line 173, in signal_process_flow
signal_temporary.load_to_dynamodb(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/flows.py", line 1120, in __call__
return enter_flow_run_engine_from_flow_call(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 291, in enter_flow_run_engine_from_flow_call
retval = from_sync.wait_for_call_in_loop_thread(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 243, in wait_for_call_in_loop_thread
return call.result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 284, in result
return self.future.result(timeout=timeout)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 168, in result
return self.__get_result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 355, in _run_async
result = await coro
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
return await fn(*args, **kwargs)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 733, in create_and_begin_subflow_run
return await terminal_state.result(fetch=True)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/states.py", line 91, in _get_state_result
raise await get_state_exception(state)
TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tgjoni/alpharoc/occam/bin/flows/signal-process.py", line 221, in <module>
signal_process_flow(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/flows.py", line 1120, in __call__
return enter_flow_run_engine_from_flow_call(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 291, in enter_flow_run_engine_from_flow_call
retval = from_sync.wait_for_call_in_loop_thread(
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 243, in wait_for_call_in_loop_thread
return call.result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 284, in result
return self.future.result(timeout=timeout)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 168, in result
return self.__get_result()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 355, in _run_async
result = await coro
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
return await fn(*args, **kwargs)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 394, in create_then_begin_flow_run
return await state.result(fetch=True)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/states.py", line 91, in _get_state_result
raise await get_state_exception(state)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 849, in orchestrate_flow_run
result = await flow_call.aresult()
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 293, in aresult
return await asyncio.wrap_future(self.future)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 318, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/home/tgjoni/alpharoc/occam/bin/flows/signal-process.py", line 211, in signal_process_flow
flow_state = Failed(message=error)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/server/schemas/states.py", line 331, in Failed
return cls(type=StateType.FAILED, **kwargs)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for State
message
str type expected (type=type_error.str)
Nate
04/14/2024, 9:18 PMTedi Gjoni
04/14/2024, 9:19 PMNate
04/14/2024, 9:21 PMFile "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 733, in create_and_begin_subflow_run
return await terminal_state.result(fetch=True)
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/states.py", line 91, in _get_state_result
raise await get_state_exception(state)
TimeoutError
but later on, this looks like some client / server mismatch potentially
File "/home/tgjoni/.conda/envs/armono/lib/python3.10/site-packages/prefect/server/schemas/states.py", line 331, in Failed
return cls(type=StateType.FAILED, **kwargs)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for State
message
str type expected (type=type_error.str)
Tedi Gjoni
04/14/2024, 9:22 PMNate
04/14/2024, 9:23 PMTedi Gjoni
04/14/2024, 9:23 PMNate
04/14/2024, 9:24 PMTedi Gjoni
04/14/2024, 9:25 PMTedi Gjoni
04/14/2024, 9:27 PMTedi Gjoni
04/14/2024, 9:27 PMNate
04/14/2024, 9:30 PMTedi Gjoni
04/15/2024, 8:45 PM20:30:08.720 | ERROR | Flow run 'romantic-eel' - Crash detected! Execution was cancelled by the runtime environment.
Traceback (most recent call last):
File "/home/ubuntu/alpharoc/occam/bin/flows/signal-process.py", line 173, in signal_process_flow
signal_temporary.load_to_dynamodb(
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/flows.py", line 1228, in __call__
return enter_flow_run_engine_from_flow_call(
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/engine.py", line 291, in enter_flow_run_engine_from_flow_call
retval = from_sync.wait_for_call_in_loop_thread(
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 217, in wait_for_call_in_loop_thread
waiter.wait()
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/waiters.py", line 173, in wait
self._handle_waiting_callbacks()
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/waiters.py", line 147, in _handle_waiting_callbacks
callback.run()
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 282, in run
coro = self.context.run(self._run_sync)
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/home/ubuntu/alpharoc/occam/src/occam/schedule/flow/signal_temporary.py", line 129, in load_to_dynamodb
future.wait()
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/futures.py", line 158, in wait
return from_sync.call_soon_in_loop_thread(wait).result() # type: ignore
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
return self.future.result(timeout=timeout)
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 181, in result
self._condition.wait(timeout)
File "/home/ubuntu/miniconda3/envs/armono/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()
prefect._internal.concurrency.cancellation.CancelledError
Tedi Gjoni
04/15/2024, 8:47 PMCrash detected! Execution was cancelled by the runtime environment.
Tedi Gjoni
04/15/2024, 8:47 PMCrash detected! Request to <http://127.0.0.1:4200/api/task_runs/> failed: PoolTimeout:
Tedi Gjoni
04/15/2024, 9:13 PMTedi Gjoni
04/15/2024, 9:13 PM