Hey all! I'm hitting an error which I think might ...
# ask-community
d
Hey all! I'm hitting an error which I think might be related to scaling the number of static tasks in my flow, wanted to get your thoughts? I have the following setup: • Prefect Cloud Server, Kubernetes Agent (k8s run config), DaskExecutor(make_cluster) -> spawns 1 dask worker, 4cpu, 12threads • Prefect v0.14.8, Docker storage in ECR I can run a flow with 196 tasks no problem (this is ~15% of our whole ETL). The UI even loads the schematic, and the flow runs to completion. All the tasks are doing is running queries on Snowflake - no data manipulation/results handling, just issuing SQL queries. When I generate the flow file with all 1192 tasks in it, I'm getting the
400 Client Error:
...
"input.states[0].task_run_id"; Expected non-nullable type UUID! not to be null.
on some of the tasks when I run the flow. I'll put the full stack trace in the 🧵. It's happening on maybe 1 in every 20 tasks or so. The task then gets put into state 'ClientFailed' (and the UI can't see them) and all downstream dependents of these tasks then get set to state 'Pending'. I've tried many dask workers, then just 1 dask worker (for simplicity), same issue. Can't replicate it with the smaller (196 task) flow. I'm wondering if there's some kind of rate limiting going on whereby there are so many concurrent tasks running simultaneously (there are a tonne all trying to be ran at the same time) that some of them are getting a generic error from cloud or something? I would try adding a task concurrency limit to see if this helps with the above hypothesis, but the UI says it's not included in our plan (even though we're an enterprise tenant). Is it possible to set task concurrency at the flow level? Also, the UI can't load the schematic of the big flow, though that's less of an immediate concern. Thanks in advance for any advice!
👀 1
stack_trace
j
Hi @David Elliott - thanks for the question. I want to make sure we've got all the info we need to help fix this so can I check 1. Has the 1192 task flow run successfully in the past? 2. Are you certain that the flow you've registered fully matches the structure of the flow that gets executed? (This issue had a similar error message.)
Hi @David Elliott - if you have a minute could you DM me your team info - ideally the slug/url that your team has or the flow or flow group ID? You can find this on the details tile of the flow page (the top left tile, under the bar chart)
d
Hey, thanks for taking a look! 1. No not yet, I'm in the process of migrating our old pipeline over to prefect. That being said, the set up in terms of runconfig/executor/registration etc is identical to the 196 run, as I do all of that in a separate build file. The difference between the two is literally just adding more tasks 2. Yep, I saw that one too. Yes pretty certain - I build it into a docker image locally and then k8s grabs that docker image from ECR - I've shell'd into the container and can confirm it looks as I expect. It also only gives me this error for a random selection of the tasks (and it's different tasks every time) Sure - I'll DM you those details now, thanks