Samuel Hinton

    Samuel Hinton

    1 year ago
    Gday all! Does prefect server currently support flow concurrency? Heres the scenario - we have a flow that polls for data every 10 minutes and stores it. Works great. However, we want to backfill the data, and need one flow run executed per day back a year (365 flow runs). Now, if I simply use the prefect client to create_flow_run 365 times, the agent schedules them all at once, the API times out, and everything crashes (see fun image below). On top of this, it would be ideal if I could have a priority of flows to be run if theres concurrency so that the regular polling doesnt get stuck in an execution queue (not an issue if we have concurrency available via agents, I can have a backfill agent and a real time agent, but unsure if theres support like that). I can see task concurrency in the doco, but have missed flow concurrency itself. Is this supported?
    s

    Spencer

    1 year ago
    Not natively. It's a feature of Prefect Cloud
    Amanda Wee

    Amanda Wee

    1 year ago
    As I understand it (haven't used the task concurrency limit feature myself), prefect server supports flow concurrency in that there are no limits to concurrent flow runs other than the limits inherent in what you're trying to do (e.g., a limited number of connections to the same database across flows). But with prefect cloud, that's where task concurrency limits comes into play: you can limit the number of running tasks across these concurrent flow runs, hence ensuring you don't exceed the limits inherent in what you're trying to do. Since you're using prefect server, you don't have this, so it seems to me that you have to do the more manual method of scheduling the flow runs yourself to space them out.
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Scheduling them myself is definitely possible, was just trying to avoid unnecessary and somewhat fragile code, given that (depending on how the public API is feeling) a flow can take between a second and a minute, so simply saying “Have 5 of these running at most” is more efficient that manually setting the start date for a host of flows to be one minute apart and playing it conservative to ensure theres minimal overlap. Ah well!
    d

    David Clements

    1 year ago
    Would it be possible to do this by limiting the parallelism via a Dask Cluster?
    Zuhair Ikram

    Zuhair Ikram

    1 year ago
    I’m new to Prefect, so maybe I am missing something:
    Not natively. It's a feature of Prefect Cloud
    However, this page says that:
    The new version of Prefect Server is no longer a fork; it's a shared codebase with Prefect Cloud. This means that improvements our team makes to our flagship product will automatically and immediately benefit our open-source community.
    As I said, maybe I am missing something, but those two statements seem to contradict one another.
    Amanda Wee

    Amanda Wee

    1 year ago
    It certainly doesn't apply to everything though, e.g., authentication is one that comes up often: available on prefect cloud, but DIY on prefect server. I looked at the Team tab in prefect ui for my prefect server setup, and besides the expected greying out of API Tokens, Members, and Secrets, I can see that Flow Concurrency and Task Concurrency sub-tabs are greyed out too.
    Zuhair Ikram

    Zuhair Ikram

    1 year ago
    If the community version is lacking features, to me, it’s a signal of where the company wants to go. Longer term, I think most ppl/companies will go the hosted route anyway, so I think it’s in everyone’s benefit to have parity. I’ll be extremely disappointed if the community version continues to be a pared down version of Cloud
    Amanda Wee

    Amanda Wee

    1 year ago
    I'm guessing @Jeremiah is the man to ask for this as he presumably sets the direction of where Prefect will go. I'm interested to know too.
    d

    David Clements

    1 year ago
    @Sam Johnson - Would it be possible to do this by limiting the parallelism via a Dask Cluster?
    Michael Adkins

    Michael Adkins

    1 year ago
    Hi! I think it's worth explaining that Prefect Server lacks some of the performant infrastructure necessary for implementing things like flow concurrency. There's a notable open PR for this feature https://github.com/PrefectHQ/server/pull/90 and you'll notice there are a lot of concerns about race conditions and edge-cases that are not trivial to deal with. The intention is not to provide a "pared down version of Cloud", there is a limit to how much infrastructure we can reasonably expect users to maintain--there are already 6(!) containers for Prefect Server--and, in parallel, a limit to the infrastructure we can roll into a single CLI command that will satisfy the full spectrum of our users. Edited to clarify that this is my perspective
    Samuel Hinton

    Samuel Hinton

    1 year ago
    That PR was an interesting read. Heres hoping someone with a bigger brain than I comes up with an elegant solution to the issues raised within the comments there!
    Zuhair Ikram

    Zuhair Ikram

    1 year ago
    @Michael Adkins understand the concern about containers. would it be possible to remove the postgres container and use sqlite instead? would it be possible to have the backend db as a setting, thereby allowing the user to use sqlite (as a default) or use postgres (RDS, container, etc.)?
    Michael Adkins

    Michael Adkins

    1 year ago
    I don't think Hasura supports sqllite (https://github.com/hasura/graphql-engine/issues/1119) so we can't use it, but supporting additional database types would likely increase complexity more than anything 😞
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Hey @Michael Adkins - the lack of concurrency bought down my server again (50 late flows all tried be spun up in their own container each in one go, causing the box to run out of memory and CPU) - you mentioned a limit was in the infrastructure. Could I ask what infrastructure you have managed to utilise in the cloud that enables concurrency so that I can at least give this a shot?
    Michael Adkins

    Michael Adkins

    1 year ago
    We use a global cache (redis) to help manage state but I can't really say more. It sounds like you may benefit from scaling to K8s where flows will be submitted as jobs that will then be scheduled onto your cluster safely.
    Samuel Hinton

    Samuel Hinton

    1 year ago
    Unfortunately Im constrained by the organisational architectures. Ill keep wracking my brains and also hoping that github issue has some movement soon