https://prefect.io logo
#prefect-community
Title
# prefect-community
b

Ben Muller

11/18/2022, 12:51 AM
Hey Prefect - I have one agent and one queue, I ran a loop over some parameters to break up some work and called
run_deployment
to essentially launch 30 concurrent ecs tasks. What I notice is that 5 of the deployment runs start and the other 25 are pending. How do I make this behave similar to prefect 1.0 where I was able to have all of these run at the same time ? Is this a limitation of my agent ? I was under the impression all the agent does is orchestrate the ecs tasks in my aws environment?
a

Anna Geller

11/18/2022, 12:54 AM
can you share your code example? we can try to reproduce
b

Ben Muller

11/18/2022, 12:56 AM
Copy code
from prefect.deployments import run_deployment

for start in range(16000, 76000, 2000):
    end = start + 2000
    run_deployment(
        name="my-flow/default",
        flow_run_name=f"my-flow{start}:{end}",
        parameters=dict(start=start, end=end),
        timeout=0,
    )
I did this too
prefect work-queue clear-concurrency-limit 'prod'
they are all just sitting there in pending state saying 20 mins late
note - all the running ones finished - and the others are still pending
oooh does it need some idempotancy key like in 1.0 ?
a

Anna Geller

11/18/2022, 1:09 AM
this looks like a bug - it shouldn't need idempotency key set manually
does it occur if you switch to Process block + local agent too?
b

Ben Muller

11/18/2022, 1:11 AM
I dont have a local agent
dam 😞
a

Anna Geller

11/18/2022, 1:13 AM
?
I'm reproducing now locally first
then ECS
b

Ben Muller

11/18/2022, 1:13 AM
ok thanks. I was just saying dam because I have to run it in order lol
ok - so if I put a 30 second sleep inbetween itterations it works
🙌 1
a

Anna Geller

11/18/2022, 1:21 AM
nice! I tested it just now locally and works fine
b

Ben Muller

11/18/2022, 1:21 AM
so it is a ECS thing? I still think it is a bug
a

Anna Geller

11/18/2022, 1:21 AM
probably a smart choice to put some time in between to avoid rate limiting issues with AWS API and serverless provisioning madness, but all of those scheduled and pending runs got executed in the end and they should
let me try the same with ECS now
b

Ben Muller

11/18/2022, 1:22 AM
yeah interested to see what you get
a

Anna Geller

11/18/2022, 1:36 AM
ok so here's my take - I don't think this is something that we should solve on the Prefect end - the only problem I see is still registering so many revisions of the same task definitions family concurrently which honestly makes sense if you are running things at scale, you should register task definition once and then trigger a bunch of containers from the same task deifnition
b

Ben Muller

11/18/2022, 1:39 AM
ok that is fair enough - why was this not an issue with prefect 1.0 though? isnt that a step backwards?
a

Anna Geller

11/18/2022, 1:39 AM
so my recommended approach for the solution is either: 1. add some sleep time in between as you suggested, this will solve it while still keeping this build step during runtime 2. register task definition once at build time e.g. from CI and reuse that, i.e. provide task definition arn on the block based on what you said earlier, #1 seems more convenient/applicable for your use case
b

Ben Muller

11/18/2022, 1:40 AM
yeah cool - will do. thanks
🙌 1
a

Anna Geller

11/18/2022, 1:41 AM
thanks for flagging, we definitely need to keep testing edge cases and do more of such scale testing, thanks so much for the positive push here 🙌
b

Ben Muller

11/18/2022, 1:42 AM
Thanks for the quick reply and help
🙌 1
a

Anna Geller

11/18/2022, 1:42 AM
I think we can figure out one day when we find the time some middle ground with task definition caching so that we keep reregistering when needed and skip that otherwise to keep the convenience without ECS throttling
🙌 1
b

Ben Muller

11/18/2022, 2:01 AM
Ah great! I will have a read after my lunch 🙂
a

Anna Geller

11/18/2022, 2:04 AM
it's just a summary, nothing new
👍 1
b

Ben Muller

11/18/2022, 4:29 AM
Hi @Anna Geller there is actually more to this story and it is quite concerning tbh. When I have multiple deployments that happen to be scheduled at the same time eg
0 * * * *
there appears like maybe there is a race condition with the agent/queue and the flows that dont get in first, stay in the
Pending
state indefinitely. I just logged onto my UI and I have flows that are up to 3 hours late and just never get started.
I dont know where to start with this but am going to have to pause my migration as this is a big deal for an orchestration system ( obviously )
I hope this is something on my side
a

Anna Geller

11/18/2022, 4:32 AM
I couldn't notice any issue for a Process run, could be some Fargate weirdness to explore. in that case, I would so much appreciate it if you could open an issue on the prefect-aws repo if this only affects ECSTask runs
b

Ben Muller

11/18/2022, 4:34 AM
I wouldn't know what exactly to write. I have messaged my account manager and probably best to start off there? Once I have a clearer idea I can do that
a

Anna Geller

11/18/2022, 4:35 AM
AE won't help you solve an infra problem, it would help if you would describe the problem the same you would to AE but on a more technical level in a GH issue. you know how fast Michael ships fixes, worth opening an issue for that reason alone 😜
b

Ben Muller

11/18/2022, 4:35 AM
yeah ok
a

Anna Geller

11/18/2022, 4:35 AM
(Michael wrote that infra block)
b

Ben Muller

11/18/2022, 4:40 AM
3 Views