We are using the free tier of Prefect Cloud and ha...
# ask-community
c
We are using the free tier of Prefect Cloud and have been using it for over 2 years. We recently have an issue with flow runs on a prefect agent running on a vm on azure. We successfully used this setup with Prefect V1 until we had to move over in the summer 2023, to V2. We get a periodic issue with the jobs - it gets stuck in pending - in the agent output we see this message
Copy code
prefect.agent - Aborted submission of flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'. Server sent an abort signal: This run is in a PENDING state and cannot transition to a PENDING state.
The previous flow gets completed normally and "exited cleanly" A bit of background: The work pool has a concurrency limit of 1 I have tried changing the polling frequency to 15 seconds, but it made no visible impact to the frequency of this occuring. Our flows are written in python and are stored in the github repo. We have Python 3.10 installed on the VM The VM that the agent is running on, does no other work and is dedicated runing only the agent and executing the jobs - all the tasks run locally on the VM, but every job completes before the next one is started. We are running the latest version of the prefect agent I see a few others report the same issue but I haven't seen a matching case with ours. Any help or pointers would be greatly appreciated.
j
hi, do you have multiple agents polling for runs?
c
We have a single agent and a single work pool
j
This would be the expected log if say two agents picked up the same flow run
c
I don't believe that is the case here.
j
The only other thing I would think of is to check the agent logs for any errors above where you see that error. Does the agent attempt to set the run to PENDING, hit some failure, and then attempts to set it to PENDING again?
c
I can share the log output...
Copy code
01:30:26.406 | INFO    | prefect.agent - Submitting flow run 'c1dba183-110e-409b-9d1c-9d11b1a40b77'
01:30:27.088 | INFO    | prefect.infrastructure.process - Opening process 'cheerful-auk'...
01:30:27.245 | INFO    | prefect.agent - Completed submission of flow run 'c1dba183-110e-409b-9d1c-9d11b1a40b77'
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in un
predictable behaviour
  warn(RuntimeWarning(msg))
01:30:29.855 | INFO    | Flow run 'cheerful-auk' - Downloading flow code from storage at ''
01:30:32.190 | INFO    | Flow run 'cheerful-auk' - Creating ingest listener request...
01:30:32.591 | INFO    | Flow run 'cheerful-auk' - Flow running on host s-uks-sv-vm-p-2
01:30:32.592 | INFO    | Flow run 'cheerful-auk' - Creating ingest listener request...
01:30:32.593 | INFO    | Flow run 'cheerful-auk' - {'JobName':'sap_orderitem_prod', 'DataJobId':310,'JobPathId':'Smythson', 'DataJobType':'OrderItem', 'RemoveSourceFile':'true'}
01:30:39.537 | INFO    | Flow run 'cheerful-auk' - {'date': '2023-10-12T01:30:39.5334325+00:00', 'jobId': 1, 'status': 'completedSuccess', 'sendCount': 117, 'receiveCount': 117, 'jobStatus
ExceptionMessage': '', 'jobStatusExceptionStackTrace': ''}
01:30:39.705 | INFO    | Flow run 'cheerful-auk' - Finished in state Completed()
01:30:42.175 | INFO    | prefect.infrastructure.process - Process 'cheerful-auk' exited cleanly.
01:30:53.010 | INFO    | prefect.agent - Submitting flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'
01:30:55.267 | INFO    | prefect.agent - Aborted submission of flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'. Server sent an abort signal: This run is in a PENDING state and cannot trans
ition to a PENDING state.
I read that the runtime warning can be ignored
Is there a way to mitigate this problem? e.g. can we place some kind of time out that means the job is cancelled after a preset amount of time? what is killing us is the jobs get stuck in the queue and it delays the availability of data to the client
j
is that the only log for flow run
fcf22119-8e49-44cc-8f80-66978a26c273
?
you can pretty easily setup what you've described with an automation
You can cancel a flow run after it's been in
PENDING
for > some time
d
Isn't it possible with timeout on flow decorator
c
yes, that was the last entry in the log for
Copy code
fcf22119-8e49-44cc-8f80-66978a26c273
it just hangs without any further output - until we cancel the flow in the prefect cloud console
s
I'm also experiencing an increase in flow runs "stuck in a pending state". Some background: - I'm still using Agents (not workers) - I have a default work pool, with two queues inside. The
default
queue does most of the work and has a concurrency limit of 25. The
other
queue has a limit of 1. - I had an agent running on an on-prem server polling the pool (so it picked up work for all the pool's queues) - I wanted to run the work of the other queue in different infra, so I created a new GCP VM, started up an agent to poll only the
other
queue of the default pool. I also changed my on-prem agent to only poll the
default
queue of the default pool. -> Every now and then, I'll get "stuck in pending" flow runs on my GCP VM (on the other queue), and no issues on my on-prem VM. (I have not compared agent logs to see if for some reason my on-prem agent picks up from the wrong queue?)
j
This run is in a PENDING state and cannot transition to a PENDING state.
can only appear if something else has put the run in a pending state. Either another agent/worker OR the same agent/worker that might have experienced some sort of failure (thats' why I asked about other logs with that id)