Hello since the last prefect update version I m receiving er Prefect Community #prefect-cloud

Hello ... since the last prefect update version .....

leonardoperazzini

04/10/2024, 8:14 PM

Hello ... since the last prefect update version ... I'm receiving erros on different type of flows which don't have co relations ... They stuck on pending state, and after N hours they are cancelled ( because I have automation to cancel too long pending flows). It's never happened before ... 1 month with this issue. It's very unusual to happen, but 1 flow in 100 happens 😕 I'm using ec2 with agent inside and infra as a process, is very simple case, but the queue not pull flow to infra ... And use prefect cloud version paid. @Nate or someone else can help me?

Nate

04/10/2024, 8:15 PM

hi @leonardoperazzini - its hard to know what you mean without some logs / stack trace can you give more info?

leonardoperazzini

04/10/2024, 8:20 PM

@Nate Yes I understand. But it's just that Prefect doesn't show me a log in that case, does not show in the prefect cloud or in the agent The flow execution remains pending... and is not called by the agent queue, while the next schedules of the same flow function normally. https://app.prefect.cloud/account/dc1ed6a7-a0e5-4127-8ff1-57045f7ef23a/workspace/dd2e34bf-cd45-4f14-82d5-f5a0c0d361b9/flow-runs/flow-run/1c1f094a-d2e3-44ce-9b55-1b0f4306e271 I already sent a email for the support email of prefect, but I'm trying here too

Nate

04/10/2024, 8:21 PM

do you have agent logs from your ec2?

leonardoperazzini

04/10/2024, 8:22 PM

I can try save logs ... im not saving them, because using tmux as a subprocess I start agent with prefect agent start -q production-dbt-server

leonardoperazzini

04/10/2024, 8:22 PM

Do you know any easy way to save logs of agent ?

Nate

04/10/2024, 8:25 PM

I'm most familiar with running the agent/worker as a `systemd` process, in which case you can just the log for that systemd process but if you're using tmux im not sure off the top, you'd likely have to redirect the output when you started the process, or maybe there's some tool to attach / read from the existing process

✅ 1

leonardoperazzini

04/10/2024, 10:00 PM

@Nate I don't know if it can have correlation with the problem in this thread ... but since the last update of prefect, some of flows which run inside of ECS, started to receive this error ( image) ... Are there some way to retry if I receive 429? This rate limite is only mine or it's a global concurrency? Thanks for helping ...

Nate

04/10/2024, 10:01 PM

hi @leonardoperazzini that seems like a docker rate limit

Nate

04/10/2024, 10:02 PM

hmm

failed to fetch anonymous token

is strange

leonardoperazzini

04/10/2024, 10:12 PM

yeah ... but its the default docker of prefect ... not mine personal docker 😕 and it not happens often too

Nate

04/11/2024, 12:12 AM

as a sanity check, does returning to the previous version resolve the problem? i suspect the prefect version might be unrelated but you mentioned this had started occurring after an upgrade?

leonardoperazzini

04/11/2024, 12:36 PM

Yeah ... started on release of exactly day 7 march ... https://github.com/PrefectHQ/prefect/releases/tag/2.16.3 Ok, I'll try fix previous version on prefect agent on ec2 ... and check it for us.

leonardoperazzini

04/11/2024, 12:43 PM

But the prefect agent or worker version doesn't seem to be the problem... because the error started to appear after that day, but we didn't change the queue version. It seems that it is related to the version of prefect that is running in the prefect cloud or in the infrastructure, and fixing these two is more complicated for me to test here.

Nate

04/11/2024, 12:48 PM

yeah i wasnt trying to suggest it could be related to the agent/worker version, rather the version of prefect used for your flow runtime

✅ 1

leonardoperazzini

04/14/2024, 6:26 PM

@Nate Apparently ... It's a problem with version above 1.16.2, after fix containers and process prefect version, stopped errors and stuck pending problems. I haven't been able to save the logs on the server yet, but as the problem was solved by fixing the version, I won't run with it to see the logs. If you know what could have been in these new versions... I would appreciate it

Nate

04/14/2024, 6:30 PM

2.16.1 had a bug that was fixed in 2.16.2 but without logs im not sure what would have been wrong with your agent / flow its also possible you had a new version of

typer

that broke the CLI, with a version of prefect that did not pin

typer

- they've released a couple breaking changes in the recent past, we had to pin it recently

leonardoperazzini

04/15/2024, 8:04 AM

Today ... I received again this error prefect_aws.workers.ecs_worker.TaskFailedToStart: CannotPullContainerError: ref pull has been retried 1 time(s): failed to resolve reference "docker.io/prefecthq/prefect:2.16.2-python3.10": failed to authorize: failed to fetch anonymous token: unexpected status: 429 Too Many Requests Only on ecs, on process every thing is working ( obvious ... there we won't use containt

leonardoperazzini

04/15/2024, 8:05 AM

container*. But in process ec2, things now not stucking om pending

Bianca Hoch

04/17/2024, 8:50 PM

Hey Leo! Just wanted to check in here.

It's a problem with version above 1.16.2, after fix containers and process prefect version, stopped errors and stuck pending problems.

is this still the case that you're no longer seeing issues with flow runs getting stuck in

Pending

leonardoperazzini

04/22/2024, 6:34 AM

@Bianca Hoch Hello! yeah ... im still receiving this error The error persists... it no longer seems to be a container version-related error. After I fixed the container (eco) and prefect ec2 version (process) to 1.16.2, the error in the EC2 process did not occur anymore, and the error message from ECS changed to the one below: prefect_aws.workers.ecs_worker.TaskFailedToStart: CannotPullContainerError: pull image manifest has been retried 1 time(s): failed to resolve ref docker.io/prefecthq/prefect:2.16.2-python3.10: failed to authorize: failed to fetch anonymous token: unexpected status: 429 Too Many Requests

leonardoperazzini

04/24/2024, 1:32 PM

@Bianca Hoch @Nate ... someone can help with this ? 😕 I don't know more what to do about error 429

leonardoperazzini

04/25/2024, 11:52 AM

Hello, I know it seems strange... but it seems like it was done at that time, there was an internal error in the mayor's api, it seems like it's the same thing, it's hitting the docker hub more often and we're getting error 429. Not before he had 498/5,000 Translation results Translation result Hello, I know it seems strange... but it seems like it happened like this, there was an internal error in the mayor api, it seems like it's the same thing, it's hitting docker hub more frequently and we're getting error 429. This didn't exist before error, and it was after the release of a new version of Mayor Cloud and Mayor that this started to happen... Wasn't it the same thing that happened before on your side? I'm really desperate to know how to fix this, my hands are tied. Did you follow that process with me and understand how complicated it was for us to start from nothing and receive an error and it seemed like it was on our side, but in fact it was the prefect's error... Could you ask internally again for staff to take a look? ? @Bianca Hoch

Nate

04/25/2024, 1:16 PM

Hi @leonardoperazzini - I don't think we've received enough information from you to know how we can help it would make it easier to us to help you if you could open an issue that details the steps you took using prefect and the errors you received when it didn't work, or feel free to list that information out clearly here instead 429s in general mean rate limits. so someone, whether its Prefect Cloud, Github Registry, AWS or someone else, is saying "you're sending us too many requests" im sorry for any frustration!

👍 1

Bianca Hoch

04/25/2024, 1:20 PM

+1 to Nate's suggestion. an issue would be very helpful. Additionally, I think this post may get you closer to finding a root cause.

Bianca Hoch

04/25/2024, 1:24 PM

Ah- and I think this post from one of our engineers may be helpful to you as well. Someone else reported a similar issue, but it wasn't Prefect-specific.

👍 2

leonardoperazzini

04/29/2024, 1:34 PM

@Nate and @Bianca Hoch just to you know ... im receiving this error too, like prefect is hitting more times then normal apis of ecs or image of docker as the last times I said 😕

leonardoperazzini

04/29/2024, 2:06 PM

It's seen things is worst today 😕

50 Views

Open in Slack

Previous Next