https://prefect.io logo
Title
b

Ben Muller

02/03/2023, 2:11 AM
Hey Prefect, I continually have flows failing with the following exception:
State message: Flow run encountered an exception. MissingResult: State data is missing. Typically, this occurs when result persistence is disabled and the state has been retrieved from the API.
I have set
prefect config set PREFECT_RESULTS_PERSIST_BY_DEFAULT=True
The only flows that fail are ones that have a large number of
.submit
calls with the
DaskTaskRunner
I don't really care if a few of the tasks have missing results, is there a way to handle this? I already do something like this, but the errors still persist.
data = []
for chunk in chunks(query, 500):
   futures = [query_api.submit(query=GetStockPrices, kwargs=kw, session=session) for kw in chunk]
   raw += [f.result(raise_on_failure=False) for f in futures]
@Christopher Boyd tagging in case you missed this one
Further context - I have ZERO errors besides this one and I also have a warning log of this:
Task run '6a621f32-e889-4b37-a271-b38294142f4b' received abort during orchestration: Error validating state: DBAPIError("(sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.QueryCanceledError'>: canceling statement due to statement timeout") Task run is in RUNNING state.
c

Christopher Boyd

02/03/2023, 10:51 PM
I haven’t had a chance to look at any of these today , I’ll be able to look Monday
1
c

Carlos Cueto

02/04/2023, 3:31 PM
@Ben Muller Once again we share the same issue. One of my flows keeps failing due to this exact same error. It uses .map() instead. Most task runs succeed but eventually this error occurs. I have also tried the same steps as you. Didn’t help.
b

Ben Muller

02/04/2023, 7:07 PM
Is it new for you @Carlos Cueto? It feels like it's got worse for me recently
c

Carlos Cueto

02/04/2023, 7:08 PM
Not new. I’ve had it since I migrated the flow that is constantly having the issue to Prefect 2.0. It worked perfectly fine in Prefect 1.0
b

Ben Muller

02/04/2023, 7:15 PM
Interesting, hopefully someone responds soon.
w

Walter Cavinaw

02/06/2023, 3:27 AM
I have the same issue.
🙌 2
a

Ankit

02/06/2023, 4:25 AM
Same here, following the thread.
🙌 2
c

Carlos Cueto

02/06/2023, 3:36 PM
Update: I just started receiving the exact same error on a flow that has been running for weeks now and had never encountered this error in the past and it doesn't even perform task mapping or concurrency. Error I get:
c

Christopher Boyd

02/06/2023, 3:37 PM
Is this a continuing issue for everyone? Would any of you be able / willing to open a github issue? I’ve raised this concern to the team, I think it would be helpful to get an example and some more details of the occurrence
b

Ben Muller

02/06/2023, 6:30 PM
This is a continuing issue for me. I can create an issue on gh today but I'm not sure how to make it reproducible. Can anyone do that?
c

Carlos Cueto

02/06/2023, 6:30 PM
I’m not able to reproduce it either. In fact, I run the same flow multiple times and sometimes it runs 10 times in a row without the issue and then I randomly get it on one of the runs
w

Walter Cavinaw

02/06/2023, 6:31 PM
It's random for us so I don't think we can make an issue. It must be random for everyone?
b

Ben Muller

02/06/2023, 6:34 PM
Random for me too.
@Christopher Boyd it sounds like this is affecting enough people and it's hard to reproduce, so I'd just be dumping this thread into a gh issue. Can we get more eyes on this as it seems to be a fair pain point for all of us (and I'm guessing many more)
5
s

Samuel Kohlleffel

02/06/2023, 11:09 PM
Same issue here. It occurs randomly and isn't limited to one flow.
k

Karanveer Mohan

02/07/2023, 12:23 AM
Same issue here. Keeps having erratically across multiple flows.
s

Stéphan Taljaard

02/07/2023, 6:14 AM
I also have this issue. It's causing data downtime and headaches for my team. Unable to reproduce; it seems to happen randomly, not necessarily time or flow-specific cc @Vincent Yu
🙌 3
c

Christopher Boyd

02/07/2023, 11:12 AM
I’m tracking these - we have been investigating and have an issue opened to remediate, I just don’t have any fix to report at the moment. I would absolutely encourage if you encounter this , to open a new issue if you can , or post the circumstances and traceback in a new thread for visibility purposes.
j

Jean-Michel Provencher

02/07/2023, 2:13 PM
Same problem on our side.
Screenshot 2023-02-07 at 9.18.00 AM.png
c

Carlos Cueto

02/07/2023, 2:19 PM
The screenshot above I posted is as much traceback as I have. I haven’t opened an issue because I can’t reproduce the issue or have any clue about it’s source.
c

Christopher Boyd

02/07/2023, 4:37 PM
what versions are you all on?
Working on the issue atm
c

Carlos Cueto

02/07/2023, 4:37 PM
2.7.11
1
:plus-one: 1
c

Christopher Boyd

02/07/2023, 4:37 PM
thank you
is this the same for all
or are there different versions in use
s

Samuel Kohlleffel

02/07/2023, 4:38 PM
2.7.10 for us
1
c

Christopher Boyd

02/07/2023, 4:50 PM
Please feel free to add anything of value that you see - we do have an internal issue for tracking as well, but any insight you can add here will help
n

Nikhil Joseph

02/08/2023, 11:38 AM
Just moved myself from prefect1 to prefect2 two weeks ago n was stuck with this for hours. there's multiple issues here from what I have understood. 1. The state data missing log isn't accurate, this occurs when an exception occurs somewhere in the flow but outside a task. the original exception is usually there right above the state missing log. 2. in this particular case its the DBAPIError thats the issue. came here looking for a solution for DBAPIError 😅. I have like 200-300 ecs tasks(same task diff params) running at a time. I get different errors for different ones: aws timeout and DBAPIError timeouts mostly. wanted to give a heads up fixing (2) dosent fix (1) running prefect 2.7.9 btw
c

Carlos Cueto

02/08/2023, 2:16 PM
Received the DBAPIError warning on an extremely important flow we have had running for many weeks that hadn't failed until today. This happened inside one of the tasks. It made the flow fail and it stalled the task and kept it in a RUNNING state for hours. This needs to be given very high priority because it's something new, was not happening until recently.
b

Ben Muller

02/08/2023, 6:33 PM
Yeah, this seems to be happening more and more for us. It's causing big issues at our company.
c

Christopher Boyd

02/08/2023, 6:40 PM
Understood, I’ve continued to vocalize within the team, and I encourage to add any examples and additional content to the issue.
t

Tanay Kothari

02/12/2023, 2:57 AM
I’m getting this same error. I have a Dask cluster on EKS. When I run the prefect flow locally, it connects to the dask cluster and runs it just fine. When I deploy it to the cloud, my prefect agent on Fargate runs the first task in my DAG and then crashes with this error a few seconds later. I’m using Prefect 2.8.0
s

Stéphan Taljaard

02/13/2023, 7:30 AM
Are you guys still experiencing this? So far for me, it has not happened again (Link: https://github.com/PrefectHQ/prefect/issues/8435#issuecomment-1427473303)
c

Carlos Cueto

02/14/2023, 1:52 PM
I haven’t seen it for a while either. Crossing fingers that PR fixed it.
b

Ben Muller

02/22/2023, 10:06 PM
I am back to getting these errors and seeing something like this too:
Task run '6528d660-f15f-4cd3-bcb6-201c2a517fab' received abort during orchestration: This run cannot transition to the RUNNING state from the RUNNING state. Task run is in RUNNING state.
I did not cancel this run FYI
j

Jean-Michel Provencher

02/22/2023, 10:06 PM
Yeah, me too
c

Carlos Cueto

02/22/2023, 10:06 PM
I’m getting the same.
Something changed.
b

Ben Muller

02/22/2023, 10:12 PM
🤦
again ...
1
cc @Christopher Boyd
Any updates here @Christopher Boyd my error channel is next to impossible to monitor any more. It's really frustrating.
c

Christopher Boyd

02/23/2023, 9:05 PM
@Ben Muller - I’d suggest / recommend opening a new thread with the current issue being faced. There have been a number of fixes and releases, along with increases to database capacity just last night. Without seeing more specific details, this is anecdotal, and won’t really help us to isolate if this is a new issue, or a regression
We did have database maintenance last night to increase capacity for some of the issues that were being faced
b

Ben Muller

02/23/2023, 9:14 PM
right - to give some direct feedback it feels like every time something breaks internally there is a push for this to somehow be "someone elses problem", I think it should be enough that I present you with the information that this is a new bug that we are encountering and given that I am a paying customer you should be doing the investigation and work to have this fixed. It is disappointing. I feel like the reliability of prefect has reduced significantly lately.
k

Kalise Richmond

02/23/2023, 9:59 PM
Hi Ben, thank you for the feedback. We totally hear you and know how frustrating these issues have been. As a paying customer, you do have access to Prefect Support and a direct line to your Customer Success Manager that I'd encourage you to use for priority support. This also allows us to attach the GitHub issues directly to your account history so that the team can better support you. I believe the new error you are seeing with Task runs in the Running state has already been filed and our engineering team is working on it. If this does not cover the use case or behavior that you are seeing, please let us know. https://github.com/PrefectHQ/prefect/issues/8602
b

Ben Muller

02/23/2023, 10:03 PM
Thanks Kalise
i

Ilya Galperin

02/28/2023, 11:10 PM
We are still seeing the same error described in the original thread (
MissingResult
error) intermittently on 2.8.2 when submitting a large number of concurrent tasks to a DaskTaskRunner. Are folks in this thread still experiencing the same in addition to the RUNNING state? Not sure if we should start a new thread here to continue reporting this issue, or if it related to the behavior described in issue #8602.
1