https://prefect.io logo
Title
m

Mac

04/26/2022, 4:17 PM
Hi, I am storing code in GitHub, and I have some scheduled flows erroring out semi-regularly due to a github api timeout. Is there a way to increase the timeout limit and/or increase or add an exponential backoff. It also doesn't seem like the flow tries to rerun. Thanks!
ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='<http://api.github.com|api.github.com>', port=443): Max retries exceeded with url: *redacted* (Caused by ConnectTimeoutError(, 'Connection to <http://api.github.com|api.github.com> timed out. (connect timeout=15)'))
k

Kevin Kho

04/26/2022, 4:19 PM
Is that from the Prefect client? You can set the environment variable:
flow.run_config = KubernetesRun(..., env={"PREFECT__CLOUD__REQUEST_TIMEOUT": 60})
👍 1
you can edit the config.toml here
m

Mac

04/26/2022, 4:33 PM
This is the agent trying to retrieve the flow code from GitHub. Isn't what you sent is for the agent to send api calls to prefect cloud/server?
k

Kevin Kho

04/26/2022, 4:38 PM
Can you give me a longer traceback? I was thinking of that but I was thinking it might be the Prefect Client making this call but I’m not sure. This timeout is on the Prefect Client
m

Mac

04/26/2022, 4:42 PM
11:00:38
INFO
agent
Submitted for execution: Task arn:aws:ecs:*****


11:01:24
INFO
GitHub
Downloading flow from GitHub storage - repo: '****', path: '****.py'


11:01:40
ERROR
execute flow-run
Failed to load and execute flow run: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='<http://api.github.com|api.github.com>', port=443): Max retries exceeded with url: **** (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7ff417584a10>, 'Connection to <http://api.github.com|api.github.com> timed out. (connect timeout=15)'))"))
k

Kevin Kho

04/26/2022, 4:46 PM
Ah that doesn’t help. One second let me read the source
👍 1
Yeah you might be right that this is not the Prefect client. We use the github library under the hood so seeing if there is a way to increase that
m

Mac

04/28/2022, 3:38 PM
@Kevin Kho Do you know if there's any way I would be able to retry on an error like this? I know how to retry tasks, but since this is storage, I don't see how I can retry the flow
k

Kevin Kho

04/28/2022, 4:33 PM
Hey Bo, I will be slow to respond due to PyCon today and tomorrow. I will leave a message with Anna and the team about this.
So I didn’t find this last time I looked but the Github class under the hood exposes a timeout. Maybe we can expose it to increase it here. Also left messages with the team about that
m

Mac

04/28/2022, 4:51 PM
Ahh good catch, that would be excellent! Also, there is a retry parameter too. That would would be just as, if not more, important to expose
k

Kevin Kho

04/28/2022, 5:01 PM
You may even be able to edit your own version of Prefect for now to increase though. I don’t have a timeline, or even if this will be an accepted change yet. Would you be interested in making a PR? It seems pretty doable
a

Anna Geller

04/29/2022, 1:16 AM
Catching up here. I wonder whether we could approach it a bit differently. It seems that the problem is that sometimes due to transient error, your flow run fails to start because Prefect flow run cannot pull the flow code from your Github storage within your ECS task, correct? Since you are on Prefect Cloud, you could leverage Automations to catch such issues and react to them. For instance, in the image below, you can see how you can start a new run (effectively a "flow-level restart") if your flow run fails to start after e.g. 120 seconds - if so, you can start a new run of the same flow.
m

Mac

04/29/2022, 1:36 AM
Thanks! Going to try this out and test over the weekend. If that fails, I will issue a PR with the proposed changes
👍 1
@Anna Geller Yes, that is the correct diagnosis of the problem. And by the way, I don't seem to have "does not start" as an option under automation
Ohh nevermind I see, I have to be on the standard or enterprise plan for that
👍 1
💯 1