Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hey guys, I'm seeing some pretty substantial delays in my jobs where my agent appears to be having SSL connection issues going from AKS with (prefect:python37-latest) to Prefect cloud with my flows registered on Azure Storage. I've seen in previous comments that this could be to do with rate limiting?

*urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out. (read timeout=None)*

Details in thread

06:21:59
lens
INFO
CloudTaskRunner
Task 'wait_for_flow_run': Starting task run...
	

06:22:00
lens
INFO
wait_for_flow_run
Flow 'eggplant-toad-plant_children_dim': Entered state &lt;Scheduled&gt;: Flow run scheduled.
	

06:22:00
lens
INFO
wait_for_flow_run
Flow 'eggplant-toad-plant_children_dim': Entered state &lt;Submitted&gt;: Submitted for execution
	

06:35:33
lens
INFO
wait_for_flow_run
Flow 'eggplant-toad-plant_children_dim': Entered state &lt;Running&gt;: Running flow.
	

06:36:38
lens
INFO
wait_for_flow_run
Flow 'eggplant-toad-plant_children_dim': Entered state &lt;Success&gt;: All reference tasks succeeded.
	

06:36:40
lens
INFO
CloudTaskRunner
Task 'wait_for_flow_run': Finished task run for task with final state: 'Success'

image.png

I also noticed my times for these jobs appears to be trending up in terms of time over the past few days. The jobs started out the first few times at 9 minutes, but now take roughly 20. This isnt due to jobs running longer but more the delays in between the agent picking up the scheduled jobs as above.

Hey <@U0242MRB48H>, chatted with the team and the latency on our end has been pretty stable for the last week so we don’t see anything off on our end, but I’ll keep this in mind.

About the *`ReadTimeoutError`, does it error out or keep things going?*

Im wondering if this is the size of agent node deployment on our end

We have it setup with resource limits of 512Mb and 1 CPU, should we possibly scale that up?

It could be about procuring resource in Kubernetes? I suppose you could try bumping it up and seeing it that helps. I’m not too sure though

In most instances it keeps going after a 15-18min delay, but on one occasion i redeployed after 30 mins

I can’t tell what that is coming from. Do you know if it’s the agent polling or from the Flow run?

It seems to be the agent polling from what i can see, its like it isnt aware of any scheduled runs

When I check the logs this is what I see e.g the SSL error