Hey Prefect I have one agent and one queue I ran a loop over Prefect Community #ask-community

Hey Prefect - I have one agent and one queue, I ra...

Ben Muller

11/18/2022, 12:51 AM

Hey Prefect - I have one agent and one queue, I ran a loop over some parameters to break up some work and called

run_deployment

to essentially launch 30 concurrent ecs tasks. What I notice is that 5 of the deployment runs start and the other 25 are pending. How do I make this behave similar to prefect 1.0 where I was able to have all of these run at the same time ? Is this a limitation of my agent ? I was under the impression all the agent does is orchestrate the ecs tasks in my aws environment?

Anna Geller

11/18/2022, 12:54 AM

can you share your code example? we can try to reproduce

Ben Muller

11/18/2022, 12:56 AM

Copy code

from prefect.deployments import run_deployment

for start in range(16000, 76000, 2000):
    end = start + 2000
    run_deployment(
        name="my-flow/default",
        flow_run_name=f"my-flow{start}:{end}",
        parameters=dict(start=start, end=end),
        timeout=0,
    )

Ben Muller

11/18/2022, 12:57 AM

I did this too

prefect work-queue clear-concurrency-limit 'prod'

Ben Muller

11/18/2022, 1:01 AM

they are all just sitting there in pending state saying 20 mins late

Ben Muller

11/18/2022, 1:02 AM

note - all the running ones finished - and the others are still pending

Ben Muller

11/18/2022, 1:06 AM

oooh does it need some idempotancy key like in 1.0 ?

Anna Geller

11/18/2022, 1:09 AM

this looks like a bug - it shouldn't need idempotency key set manually

Anna Geller

11/18/2022, 1:09 AM

does it occur if you switch to Process block + local agent too?

Ben Muller

11/18/2022, 1:11 AM

I dont have a local agent

Ben Muller

11/18/2022, 1:11 AM

dam 😞

Anna Geller

11/18/2022, 1:13 AM

Anna Geller

11/18/2022, 1:13 AM

I'm reproducing now locally first

Anna Geller

11/18/2022, 1:13 AM

then ECS

Ben Muller

11/18/2022, 1:13 AM

ok thanks. I was just saying dam because I have to run it in order lol

Ben Muller

11/18/2022, 1:18 AM

ok - so if I put a 30 second sleep inbetween itterations it works

🙌 1

Anna Geller

11/18/2022, 1:21 AM

nice! I tested it just now locally and works fine

Ben Muller

11/18/2022, 1:21 AM

so it is a ECS thing? I still think it is a bug

Anna Geller

11/18/2022, 1:21 AM

probably a smart choice to put some time in between to avoid rate limiting issues with AWS API and serverless provisioning madness, but all of those scheduled and pending runs got executed in the end and they should

Anna Geller

11/18/2022, 1:22 AM

let me try the same with ECS now

Ben Muller

11/18/2022, 1:22 AM

yeah interested to see what you get

Anna Geller

11/18/2022, 1:36 AM

ok so here's my take - I don't think this is something that we should solve on the Prefect end - the only problem I see is still registering so many revisions of the same task definitions family concurrently which honestly makes sense if you are running things at scale, you should register task definition once and then trigger a bunch of containers from the same task deifnition

Ben Muller

11/18/2022, 1:39 AM

ok that is fair enough - why was this not an issue with prefect 1.0 though? isnt that a step backwards?

Anna Geller

11/18/2022, 1:39 AM

so my recommended approach for the solution is either: 1. add some sleep time in between as you suggested, this will solve it while still keeping this build step during runtime 2. register task definition once at build time e.g. from CI and reuse that, i.e. provide task definition arn on the block based on what you said earlier, #1 seems more convenient/applicable for your use case

Ben Muller

11/18/2022, 1:40 AM

yeah cool - will do. thanks

🙌 1

Anna Geller

11/18/2022, 1:41 AM

thanks for flagging, we definitely need to keep testing edge cases and do more of such scale testing, thanks so much for the positive push here 🙌

Ben Muller

11/18/2022, 1:42 AM

Thanks for the quick reply and help

🙌 1

Anna Geller

11/18/2022, 1:42 AM

I think we can figure out one day when we find the time some middle ground with task definition caching so that we keep reregistering when needed and skip that otherwise to keep the convenience without ECS throttling

🙌 1

Anna Geller

11/18/2022, 2:01 AM

for posterity https://discourse.prefect.io/t/how-to-run-deployments-with-ecstask-infrastructure-block[…]f-serverless-ecs-containerized-flow-runs-concurrently/1929 feel free to add something Ben

Ben Muller

11/18/2022, 2:01 AM

Ah great! I will have a read after my lunch 🙂

Anna Geller

11/18/2022, 2:04 AM

it's just a summary, nothing new

👍 1

Ben Muller

11/18/2022, 4:29 AM

Hi @Anna Geller there is actually more to this story and it is quite concerning tbh. When I have multiple deployments that happen to be scheduled at the same time eg

0 * * * *

there appears like maybe there is a race condition with the agent/queue and the flows that dont get in first, stay in the

Pending

state indefinitely. I just logged onto my UI and I have flows that are up to 3 hours late and just never get started.

Ben Muller

11/18/2022, 4:29 AM

I dont know where to start with this but am going to have to pause my migration as this is a big deal for an orchestration system ( obviously )

Ben Muller

11/18/2022, 4:31 AM

I hope this is something on my side

Anna Geller

11/18/2022, 4:32 AM

I couldn't notice any issue for a Process run, could be some Fargate weirdness to explore. in that case, I would so much appreciate it if you could open an issue on the prefect-aws repo if this only affects ECSTask runs

Ben Muller

11/18/2022, 4:34 AM

I wouldn't know what exactly to write. I have messaged my account manager and probably best to start off there? Once I have a clearer idea I can do that

Anna Geller

11/18/2022, 4:35 AM

AE won't help you solve an infra problem, it would help if you would describe the problem the same you would to AE but on a more technical level in a GH issue. you know how fast Michael ships fixes, worth opening an issue for that reason alone 😜

Ben Muller

11/18/2022, 4:35 AM

yeah ok

Anna Geller

11/18/2022, 4:35 AM

(Michael wrote that infra block)

Ben Muller

11/18/2022, 4:40 AM

https://github.com/PrefectHQ/prefect-aws/issues/156

gratitude thank you 1

3 Views

Open in Slack

Previous Next