Hey folks, we’re seeing very strange flow registra...
# prefect-community
d
Hey folks, we’re seeing very strange flow registration behaviour when trying to register a long-standing existing flow to Prefect Cloud (1.0). My question is, is there a 2500 max task limit in Prefect Cloud on statically defined DAGs? We’ve been registering this flow for months (and it’s been growing) and we’ve just surpassed 2500 nodes, and it’s now throwing an error, so want to check if you have an internal limit defined somewhere…
Context • Prefect 1.2.4, Tenant = Deliveroo • it’s our massive static flow, which has historically had >2000 nodes, but as of today has just gone over 2500 The error message is
Copy code
prefect.exceptions.ClientError: [{'path': ['register_edges'], 'message': 'Edges could not be registered - some edges reference tasks that do not exist within this flow.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
however that’s 100% not true - we’ve validated this locally, and all the edges + nodes do exist, which makes me think Prefect Cloud is stopping at 2500 tasks somewhere, and then the edges are throwing an error
Flow ID is: https://cloud.prefect.io/deliveroo/flow/db9bef75-eaf9-4656-9222-b2052b0e1e85 Prefect Cloud Tenant ID:
ecf1e74b-ff7b-43f2-b734-c062084a6c3b
in case you can see anything in the registration logs?
z
Are you using the CLI or the Python register method?
d
Python method
We reverted a few additions back to 2498 tasks and it registers fine, but as soon as we go over that, we get the above message, hence the 2500 internal cloud limit hunch
z
@Zach Angell looks like they're using the batched logic insertion I can look into the client side this afternoon.
gratitude thank you 1
d
I think you (possibly even Zach or Jenny) actually added the batched registration logic specifically for us when our flow got too large, so the API could handle it 🙂
z
:) I don't know of any limits in the implementation but we'll see!
🙌 1
If you could put a breakpoint at https://github.com/PrefectHQ/prefect/blob/1.x/src/prefect/client/client.py#L913 and confirm that the
len(serialized_tasks)
is the number you expect and that
stop
at L925 after the loop completes is greater than that number we should be able to eliminate a client-side issue
d
Hey! sure - so I just did that, and it looks like it’s batching as expected..? I modified this section as per the script below to add logging, and have attached the logging in the file below as well
Untitled.py
TL;DR - it does appear to be processing the correct number of tasks (2506) across 6x batches, but then when it gets to the first batch of edges we get the ‘task doesn’t exist’ error
I also added in a step in the above register method to write the
serialized_tasks
and
serialized_edges
out to json files, and have since read them into the console and ran a simple comparison to ensure all the edge slugs in the
serialized_edges
exist within
serialized_tasks
, and they do, so I’m certain there’s something going on with the graphQL not actually registering these tasks or something 🤔 Can you see anything from the cloud logs on your side?
z
Yep this looks like a bug in Cloud, until it’s resolved flows with over 2,500 tasks will not register correctly. We will work on getting a fix out!
d
Thank you 🙏
d
Hey Zach, now that you've identified the bug, roughly how long do you think it'll take to fix it? Our plan to mitigate will vary depending on how long this will block our ability to add nodes
z
It’s not necessarily a bug in our code, but rather a limit that’s enforced on queries for task runs. In theory, this means it should be relatively simple to change, but we’ll need to make sure that it doesn’t break anything.
d
Brill, thanks. We are trying to prune any legacy nodes that aren't needed today, but it'd be great to get the limit removed 👍