Hey guys, I'm seeing some pretty substantial delay...
# ask-community
m
Hey guys, I'm seeing some pretty substantial delays in my jobs where my agent appears to be having SSL connection issues going from AKS with (prefect:python37-latest) to Prefect cloud with my flows registered on Azure Storage. I've seen in previous comments that this could be to do with rate limiting? urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out. (read timeout=None) Details in thread
062159 lens INFO CloudTaskRunner Task 'wait_for_flow_run': Starting task run... 062200 lens INFO wait_for_flow_run Flow 'eggplant-toad-plant_children_dim': Entered state <Scheduled>: Flow run scheduled. 062200 lens INFO wait_for_flow_run Flow 'eggplant-toad-plant_children_dim': Entered state <Submitted>: Submitted for execution 063533 lens INFO wait_for_flow_run Flow 'eggplant-toad-plant_children_dim': Entered state <Running>: Running flow. 063638 lens INFO wait_for_flow_run Flow 'eggplant-toad-plant_children_dim': Entered state <Success>: All reference tasks succeeded. 063640 lens INFO CloudTaskRunner Task 'wait_for_flow_run': Finished task run for task with final state: 'Success'
I also noticed my times for these jobs appears to be trending up in terms of time over the past few days. The jobs started out the first few times at 9 minutes, but now take roughly 20. This isnt due to jobs running longer but more the delays in between the agent picking up the scheduled jobs as above.
k
Hey @Michael Law, chatted with the team and the latency on our end has been pretty stable for the last week so we don’t see anything off on our end, but I’ll keep this in mind. About the
ReadTimeoutError
, does it error out or keep things going?
m
Im wondering if this is the size of agent node deployment on our end
We have it setup with resource limits of 512Mb and 1 CPU, should we possibly scale that up?
k
It could be about procuring resource in Kubernetes? I suppose you could try bumping it up and seeing it that helps. I’m not too sure though
m
Yeah cool, will do. Thanks again.
In most instances it keeps going after a 15-18min delay, but on one occasion i redeployed after 30 mins
k
I can’t tell what that is coming from. Do you know if it’s the agent polling or from the Flow run?
m
It seems to be the agent polling from what i can see, its like it isnt aware of any scheduled runs
When I check the logs this is what I see e.g the SSL error
k
Gotcha will continue to ask the team
👍 1