Thread
#prefect-community
    a

    Alex Papanicolaou

    1 year ago
    Hi folks, @Marwan Sarieddine and I have a general question about flow runs, the future without a flow concurrency limit, and imposing compute constraints. More detail in the thread.
    More specifically, we are running prefect cloud v0.14.9 using a kubernetes agent and a dask-kuberenetes execution environment and we have an AWS EC2 limit for certain instance types. Suppose we have two flows that each require more than 50% of the limit and they’re scheduled to run at similar times. We will run into one of two problem for the second flow run:1. The capacity is completely used up by the first flow so the K8s job will fail to run, the second flow run will fail scheduling, the lazarus process attempts to re-schedule it and fails given the compute capacity is still in use, and eventually we get a flow run failure after lazarus gives up. 2. The capacity isn’t completely used up by the first flow so instead, the second flow will reach a running state but will stall and hang given the minimum number of workers can’t be fulfilled. Eventually prefect’s heartbeat will kick in and fail the flow run. It seems that the proposed solution as of now is to make use of prefect cloud’s flow run concurrency labels - i.e. label these two flow runs with the same label “large-flow” and set a limit of 1, this way we ensure these flow runs never run simultaneously. This doesn’t seem like an intuitive or scalable solution for users as it seems you would have to account for all possible combinations of flow runs that can’t run concurrently. You quickly end up with a proliferation of labels which would be a nightmare to maintain and update ... i.e. flow 1, 2, 3, 4 all use 25% of the capacity so we label those with a label and set a limit of 4, then we introduce flow 5 which uses 50%, so now we need to label flows (1,2, 5), (1,3,5) and (1,4,5) each with a label and set a limit of 3 and so on ... An idea we have that could build off the labels is flow weights. Scheduling/submission would look to the weights on the flow (and default to 0) and ensure that the total weight sum of running flows can’t exceed 1. The point is that Prefect (waving hand at the various components whether it’s cloud or agent) won’t try to submit something that will exceed capacity and knowingly fail. It could hold off and let the flow be queued up until capacity is available.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    That's certainly an interesting idea, and I can see the issue with the current recommendation. Would you mind opening an issue in the prefect repo detailing your issue and suggestion (even copy-pasting the above would be fine): https://github.com/PrefectHQ/prefect/issues.
    a

    Alex Papanicolaou

    1 year ago
    Done