Hi everyone! After upgraded Prefect to 3.2.7, I tr...
# prefect-ui
g
Hi everyone! After upgraded Prefect to 3.2.7, I tried to retry, by clicking on the retry button, a failed flow run. The flow run is now stuck in AwaitingRetry state. Any idea what could be? The deployment is a simple one, it executes the flow run locally on Prefect instance. If I try to run the deployment it runs fine
j
Hi @Giacomo Chiarella - can you give a bit more info about your set up? 1. Are you on Cloud or OSS? 2. What type of work pool are you using? 3. Does it happen with a new work pools? 4. Do you have any sort of deployment or work pool concurrency set? 5. How long has the run been stuck in awaitingRetry? And anything else you think might help!
g
Hi @Jenny, sure. I’ve just deployed Prefect by installing via
python -m pip install prefect==3.2.7
. Once this is done, I’ve deployed this flow
Copy code
from prefect import flow, task
@task(name="simple_task", retries=1, retry_delay_seconds=30, retry_jitter_factor=1)
def simple_task():
    raise Exception("Test")


@flow(name=DAG_NAME, retries=1, retry_delay_seconds=30, description="Test")
def flow_entrypoint():
    a = simple_task.submit()
    a.wait()
    if str.lower(a.state.name) != "completed":
        raise Exception("Flow error")
if I just run a brand new flow run everything works as expected and the flow run fails. If, once it failed, I use the UI button to retry the flow run gets stuck in AwaitingRetry. The flow run concurrency is set to 1 and there are no other flow runs at all in the whole Prefect instance. It is stuck indefinitely, I had to cancel it after 2 hours it was in AwaitingRetry. The work pool is of type Process. I think there is something off with the retry button submission because everything else works as expected
j
Thanks! And are you using Prefect Cloud or Prefect Server?
🙌 1
g
I’ve installed in an EC2 instance in aws cloud
j
Hmmm... if I try to reproduce my runs are picked up pretty snappily. If I set a concurrency limit I do see it all working but the second runs has to wait a while. Can you try increasing your concurrency limit and see if your runs get picked up? I suspect something is blocking it. Sorry I don't have an easy fix for you!
g
@Jenny no problem! It looks like something related to only a manual retry, I’m not so concerned although would be good to know what is the problem. I will try to set concurrency to 2 and see what happens. Meanwhile, I have another question. In Prefect 3.2.7 I’m randomly having issue calling .wait() on a task future. For example, in the code I showed you above, at the end of the flow I do
Copy code
a.wait()
logger.info(a.task_run_id)
task_run_name = asyncio.run(get_task_run_name_by_id(prefect_future.task_run_id))
where the function get_task_run_name_by_id is simply this
Copy code
async def get_task_run_name_by_id(task_run_id: UUID):
    async with get_client() as client:
        task_run = await client.read_task_run(task_run_id)
        return task_run.name
the reason I’m doing that is because I check the state
a.state.name
in order to make the flow run fails if the task is not in completed state. The task run name is needed to log each task run with its state. Sometimes, randomly, I get
Copy code
Encountered exception during execution: ObjectNotFound(None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/client/orchestration/__init__.py", line 843, in read_task_run
    response = await self._client.get(f"/task_runs/{task_run_id}")
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1768, in get
    return await self.request(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1540, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 354, in send
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 162, in raise_for_status
    raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Client error '404 Not Found' for url '<http://prefect_orion:4200/api/task_runs/8317922a-ebd4-4eaa-85ed-7c374c07e69c>'

Response: {'detail': 'Task not found'}
when I call
task_run = await client.read_task_run(task_run_id)
and the strange thing is that I get this error at the very beginning of the flow, when the task is not started. Basically, the wait method is not waiting. Is it possible it is called too early. Why this happens? Is there a better way to wait for a task to finish? Can I mitigate this behaviour with something better than a time.sleep() before the wait? The logger after the wait (which does not wait) prints the task run id. Looks like the task run is not created yet in the database and the wait does not hold for the execution to finish
@Jenny regarding the retry issue I’ve noticed that is happening only with the flow sample I showed you at the moment. Maybe something off with the deployment, I will try to redeploy it. Meanwhile, would be really appreciated if there are some ideas regarding the other issue. At the moment I mitigated by using a time.sleep before any future.wait() but I wonder if there is a Prefect way to do that
j
Thanks for following up! I don't know the answer to your question off the top off my heads I'm afraid - perhaps best asked as a new question in #CL09KU1K7
g
Thank you anyway!
👍 1