We are using the free tier of Prefect Cloud and have been us Prefect Community #ask-community

We are using the free tier of Prefect Cloud and ha...

Charles Hunt

10/12/2023, 3:55 PM

We are using the free tier of Prefect Cloud and have been using it for over 2 years. We recently have an issue with flow runs on a prefect agent running on a vm on azure. We successfully used this setup with Prefect V1 until we had to move over in the summer 2023, to V2. We get a periodic issue with the jobs - it gets stuck in pending - in the agent output we see this message

Copy code

prefect.agent - Aborted submission of flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'. Server sent an abort signal: This run is in a PENDING state and cannot transition to a PENDING state.

The previous flow gets completed normally and "exited cleanly" A bit of background: The work pool has a concurrency limit of 1 I have tried changing the polling frequency to 15 seconds, but it made no visible impact to the frequency of this occuring. Our flows are written in python and are stored in the github repo. We have Python 3.10 installed on the VM The VM that the agent is running on, does no other work and is dedicated runing only the agent and executing the jobs - all the tasks run locally on the VM, but every job completes before the next one is started. We are running the latest version of the prefect agent I see a few others report the same issue but I haven't seen a matching case with ours. Any help or pointers would be greatly appreciated.

Jake Kaplan

10/12/2023, 3:59 PM

hi, do you have multiple agents polling for runs?

Charles Hunt

10/12/2023, 3:59 PM

We have a single agent and a single work pool

Jake Kaplan

10/12/2023, 4:00 PM

This would be the expected log if say two agents picked up the same flow run

Charles Hunt

10/12/2023, 4:02 PM

I don't believe that is the case here.

Jake Kaplan

10/12/2023, 4:02 PM

The only other thing I would think of is to check the agent logs for any errors above where you see that error. Does the agent attempt to set the run to PENDING, hit some failure, and then attempts to set it to PENDING again?

Charles Hunt

10/12/2023, 4:02 PM

I can share the log output...

Charles Hunt

10/12/2023, 4:03 PM

Copy code

01:30:26.406 | INFO    | prefect.agent - Submitting flow run 'c1dba183-110e-409b-9d1c-9d11b1a40b77'
01:30:27.088 | INFO    | prefect.infrastructure.process - Opening process 'cheerful-auk'...
01:30:27.245 | INFO    | prefect.agent - Completed submission of flow run 'c1dba183-110e-409b-9d1c-9d11b1a40b77'
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in un
predictable behaviour
  warn(RuntimeWarning(msg))
01:30:29.855 | INFO    | Flow run 'cheerful-auk' - Downloading flow code from storage at ''
01:30:32.190 | INFO    | Flow run 'cheerful-auk' - Creating ingest listener request...
01:30:32.591 | INFO    | Flow run 'cheerful-auk' - Flow running on host s-uks-sv-vm-p-2
01:30:32.592 | INFO    | Flow run 'cheerful-auk' - Creating ingest listener request...
01:30:32.593 | INFO    | Flow run 'cheerful-auk' - {'JobName':'sap_orderitem_prod', 'DataJobId':310,'JobPathId':'Smythson', 'DataJobType':'OrderItem', 'RemoveSourceFile':'true'}
01:30:39.537 | INFO    | Flow run 'cheerful-auk' - {'date': '2023-10-12T01:30:39.5334325+00:00', 'jobId': 1, 'status': 'completedSuccess', 'sendCount': 117, 'receiveCount': 117, 'jobStatus
ExceptionMessage': '', 'jobStatusExceptionStackTrace': ''}
01:30:39.705 | INFO    | Flow run 'cheerful-auk' - Finished in state Completed()
01:30:42.175 | INFO    | prefect.infrastructure.process - Process 'cheerful-auk' exited cleanly.
01:30:53.010 | INFO    | prefect.agent - Submitting flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'
01:30:55.267 | INFO    | prefect.agent - Aborted submission of flow run 'fcf22119-8e49-44cc-8f80-66978a26c273'. Server sent an abort signal: This run is in a PENDING state and cannot trans
ition to a PENDING state.

Charles Hunt

10/12/2023, 4:04 PM

I read that the runtime warning can be ignored

Charles Hunt

10/12/2023, 4:06 PM

Is there a way to mitigate this problem? e.g. can we place some kind of time out that means the job is cancelled after a preset amount of time? what is killing us is the jobs get stuck in the queue and it delays the availability of data to the client

Jake Kaplan

10/12/2023, 4:14 PM

is that the only log for flow run

fcf22119-8e49-44cc-8f80-66978a26c273

Jake Kaplan

10/12/2023, 4:14 PM

you can pretty easily setup what you've described with an automation

Jake Kaplan

10/12/2023, 4:15 PM

You can cancel a flow run after it's been in

PENDING

for > some time

Deceivious

10/12/2023, 4:59 PM

Isn't it possible with timeout on flow decorator

Charles Hunt

10/12/2023, 8:47 PM

yes, that was the last entry in the log for

Copy code

fcf22119-8e49-44cc-8f80-66978a26c273

Charles Hunt

10/12/2023, 8:47 PM

it just hangs without any further output - until we cancel the flow in the prefect cloud console

Stéphan Taljaard

10/13/2023, 5:17 AM

I'm also experiencing an increase in flow runs "stuck in a pending state". Some background: - I'm still using Agents (not workers) - I have a default work pool, with two queues inside. The

default

queue does most of the work and has a concurrency limit of 25. The

other

queue has a limit of 1. - I had an agent running on an on-prem server polling the pool (so it picked up work for all the pool's queues) - I wanted to run the work of the other queue in different infra, so I created a new GCP VM, started up an agent to poll only the

other

queue of the default pool. I also changed my on-prem agent to only poll the

default

queue of the default pool. -> Every now and then, I'll get "stuck in pending" flow runs on my GCP VM (on the other queue), and no issues on my on-prem VM. (I have not compared agent logs to see if for some reason my on-prem agent picks up from the wrong queue?)

Jake Kaplan

10/13/2023, 1:40 PM

This run is in a PENDING state and cannot transition to a PENDING state.

can only appear if something else has put the run in a pending state. Either another agent/worker OR the same agent/worker that might have experienced some sort of failure (thats' why I asked about other logs with that id)

14 Views

Open in Slack

Previous Next