https://prefect.io logo
Title
r

Ross Teach

09/15/2022, 1:44 PM
Hello all, I've noticed the following error intermittently using Prefect Cloud for the last few weeks. I'm running my agent on an EC2 instance. My deployments use docker containers. I updated recently from 2.1.0 to 2.3.2. The same errors have occurred with both versions. Any insight into why this may be happening?
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<https://api.prefect.cloud/api/accounts/043b2649-9d07-4c5e-8225-521ba2275e68/workspaces/689b139b-a725-4c2b-b167-86a705b8789d/task_runs/7d0c4bc0-a212-49b3-94e2-f1b9bdee765f/set_state>'
Response: {'exception_message': 'Internal Server Error'}
For more information check: <https://httpstatuses.com/500>
👀 1
stacktrace from the agent logs
Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 595, in orchestrate_flow_run result = await run_sync(flow_call) File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 57, in run_sync_in_worker_thread return await anyio.to_thread.run_sync(call, cancellable=True) File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 38, in main is_ready = is_run_dbt_ready() File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 76, in is_run_dbt_ready is_all_ready = all(is_run_dbt_ready_for_day(day=day) for day in days) File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 76, in <genexpr> is_all_ready = all(is_run_dbt_ready_for_day(day=day) for day in days) File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 84, in is_run_dbt_ready_for_day is_all_valid = all(report_values_valid_for_day( File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 84, in <genexpr> is_all_valid = all(report_values_valid_for_day( File "/opt/prefect/hub_prefect/flows/xandr/dbt_final/flow.py", line 104, in report_values_valid_for_day bigquery_rollup_column_value = get_bigquery_column_value( File "src/dependency_injector/_cwiring.pyx", line 28, in dependency_injector._cwiring._get_sync_patched._patched File "/usr/local/lib/python3.10/site-packages/prefect/tasks.py", line 295, in __call__ return enter_task_run_engine( File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 735, in enter_task_run_engine return run_async_from_worker_thread(begin_run) File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 137, in run_async_from_worker_thread return anyio.from_thread.run(call) File "/usr/local/lib/python3.10/site-packages/anyio/from_thread.py", line 49, in run return asynclib.run_async_from_thread(func, *args) File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 970, in run_async_from_thread return f.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 863, in get_task_call_return_value return await future._result() File "/usr/local/lib/python3.10/site-packages/prefect/futures.py", line 236, in _result return final_state.result(raise_on_failure=raise_on_failure) File "/usr/local/lib/python3.10/site-packages/prefect/orion/schemas/states.py", line 143, in result raise data File "/usr/local/lib/python3.10/site-packages/prefect/task_runners.py", line 203, in submit result = await call() File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1084, in begin_task_run return await orchestrate_task_run( File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1223, in orchestrate_task_run state = await propose_state( File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1435, in propose_state response = await client.set_task_run_state( File "/usr/local/lib/python3.10/site-packages/prefect/client.py", line 1797, in set_task_run_state response = await <http://self._client.post|self._client.post>( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1842, in post return await self.request( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1527, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "/usr/local/lib/python3.10/site-packages/prefect/client.py", line 279, in send response.raise_for_status() File "/usr/local/lib/python3.10/site-packages/prefect/client.py", line 225, in raise_for_status raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
I do have code where tasks are not called directly from flows. I'm wondering if the retry mechanism could be failing there.
from prefect import flow
from prefect import task


@flow
def flow():
    python_method()


def python_method():
    task()


@task(retries=3, retry_delay_seconds=30)
def task():
    print('HELLO')


if __name__ == '__main__':
    flow()
Actually, I was able to test in local that the following code retried successfully. So my previous comment was likely not the issue.
from prefect import flow
from prefect import task



@flow
def flow():
    python_method()



def python_method():
    task()



@task(retries=10, retry_delay_seconds=1)
def task():
    raise Exception('error')



if __name__ == '__main__':
    flow()
b

Bianca Hoch

09/15/2022, 8:51 PM
Hi Ross, thanks for reaching out. I noticed that you messaged us previously about a similar issue. If possible, can you update to the latest version (2.4.0) and let us know if this continues? If the error continues to happen, can you tell us a bit more about your deployment?
r

Ross Teach

09/15/2022, 11:06 PM
Thanks Bianca, I will give it a try. Yeah I updated recently to 2.3.2 and hoped that might be the issue. I will try 2.4. Any insight into why this happening and/or why upgrading might help?
r

Robin Weiß

09/16/2022, 6:21 AM
Hey @Bianca Hoch I am currently facing exactly the same issues and unfortunately it’s blocking our company’s decision to go forward with Prefect as the main workflow automation tool. I did already update to 2.4.0 and it didn’t help. Weirdly enough, these errors pop up for me after the flows were peacefully running for a few hours. It really does seem to be an issue with the Prefect Cloud API from my side (but probably it’s a layer 8 problem anyway)
a

Anna Geller

09/16/2022, 1:49 PM
it’s blocking our company’s decision to go forward with Prefect
in that case, I'd highly recommend getting in touch with paid support at cs@prefect.io, where you can get personalized help
I did already update to 2.4.0 and it didn’t help.
I'd recommend sharing your exact steps and minimal reproducible example as a GitHub issue -- we can try to reproduce, could be a bug; we can't know without an example and exact details of your setup
b

Bianca Hoch

09/16/2022, 6:38 PM
Hi team, just wanted to give a quick update that there is an internal ticket opened for this.
👍 2
r

Ross Teach

09/16/2022, 10:22 PM
Thank you all. I will try different deployment configurations and see if the issue still occurs.
r

Robin Weiß

09/17/2022, 11:20 AM
@Anna Geller I am already in contact with @Christopher Boyd about these issues and he is looking into it, thanks 🙂 And regarding contacting cs: The problem here is that my client would like to get things up and running in a minimalistic setup without paid support. We are not even doing any super heavy lifting yet, so it would be great to get things done on our own. Thanks for your support!
:thank-you: 1
👍 1
c

Christopher Boyd

09/19/2022, 12:32 PM
Unfortunately we have seen more of these errors and are investigating the performance
:thank-you: 4
s

Stefan

09/21/2022, 7:10 AM
Hi everyone, just to add to the above. We're also intermittently receiving 500 Internal Server errors. The latest one happened after a flow ran successfully for 9 minutes having already completed 50+ task runs. Flow run ID is d75a3838-5a33-45e1-b11c-43fe147e1317 and the error
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<https://api.prefect.cloud/api/accounts/.../workspaces/.../task_runs/>'
in case this helps with the debugging. Happy to provide any additional information you require, but unfortunately it is not reproducible so I can't give any specific code example.
1
Still facing random 500er errors which even lead to the agent crashing:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 212, in wrapper
    return run_async_in_new_loop(async_fn, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 141, in run_async_in_new_loop
    return anyio.run(partial(__fn, *args, **kwargs))
  File "/usr/local/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/agent.py", line 126, in start
    await critical_service_loop(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/services.py", line 39, in critical_service_loop
    await workload()
  File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 118, in get_and_submit_flow_runs
    async for work_queue in self.get_work_queues():
  File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 82, in get_work_queues
    work_queue = await self.client.read_work_queue_by_name(name)
  File "/usr/local/lib/python3.8/site-packages/prefect/client/orion.py", line 669, in read_work_queue_by_name
    response = await self._client.get(f"/work_queues/name/{name}")
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 1751, in get
    return await self.request(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 1527, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.8/site-packages/prefect/client/base.py", line 182, in send
    response.raise_for_status()
  File "/usr/local/lib/python3.8/site-packages/prefect/client/base.py", line 125, in raise_for_status
    raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<https://api.prefect.cloud/api/accounts/.../work_queues/name/queue_main>'
r

Robin Weiß

10/07/2022, 7:13 AM
I still have the same issues, too. @Christopher Boyd is there any updates or an ETA yet?
@Stefan Have you tried reducing the amount of logging messages during flow execution? It’s just a really random guess from my side, but maybe the underlying problem on Prefect’s side is somehow related to their logging servers not responding in time. Either way, it’s starting to become a blocker for us and we really hoped this would be fixed or at least addressed somehow by now 😞
c

Christopher Boyd

10/07/2022, 12:27 PM
This seems like a different error than the original ones we were seeing? Many of the reported ones were in task_run / task_statte, while this appears in the work_queue; I can take this feedback back to the team. There have been improvements made however and pushed, so generally speaking this should be much less an issue - if you are having issues, then traces like this are helpful to isolate