Thread
#prefect-community
    b

    BK Lau

    1 year ago
    Q: If I have 100 on-prem worker nodes, say, do I have to install a Prefect Agent on each of them to enable them to be controlled by a Prefect Server?
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    The best thing to do here would be to install some larger cluster-management system to ensure resources are shared properly (k8s, yarn, slurm, etc...). Prefect doesn't think about resource usage, so you run the risk over overcommitting a single node. That said, you have two quick options that would work without a cluster manager. • Run one agent per node. Agents grab flows to run on a first-come-first-server basis, so flows will tend to distribute across your cluster. This happens through random dispersion though, so you still may get a node with more flows than another, leading to overcommitted resources, but in practice this may work well. This lets multiple flows run in parallel. • Run a single agent on one node, and start a dask cluster across the remaining nodes. Register flows with a
    DaskExecutor
    . This would let dask manage dispersing tasks across your cluster, with all flow-runners (lightweight processes managing a flow run) running on a single node with the agent. This works well for task level parallelism, but less well for flow level paralellism Which option is best depends on your use case. You could also mix these, with a few nodes running agents, and the remaining running a dask cluster.
    For setting up a dask cluster across a cluster of nodes without any cluster-management software, you might find https://docs.dask.org/en/latest/setup/ssh.html useful.
    b

    BK Lau

    1 year ago
    @jim @Dylan I was thinking : Apache Beam had a unified workflow programming model. Any chance that you can have a Prefect provider for Apache Beam?. You see, every workflow framework have their own "model" and tradeoffs.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    IMO Prefect doesn't really make sense as a runner for apache beam (we'd be a consumer of the beam api, not a "runner" for it). Dask (the thing prefect runs on) may make sense as a runner, but prefect is higher level than that.
    b

    BK Lau

    1 year ago
    @Jim Crist-Harif What about driving Prefect via Apache Beam programming model. Is this a possibility? If possible, we can write the interfacing code.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Sorry, that's what I mean - the prefect task & flow model doesn't really fit with the apache beam programming model. You could use prefect to run beam code, but using beam to run prefect code doesn't really make sense.
    b

    BK Lau

    1 year ago
    @Jim Crist-Harif Thanks, that's what I had intended. The reason is that currently Prefect support only Dask; but i wanted to use Apache Flink or Spark or some other. The attractive feature to me is hybrid model. So I would have Beam installed on Premise and use Prefect Cloud to drive the Apache Beam. Unless, Prefect has a pluggable "driver" architecture that I can use to interface with Beam. This would naturally broaden third-party integration and contribution. I see this as a huge win for Prefect adoption...
    Marz

    Marz

    1 year ago
    Hi @BK Lau, I’m in the initial stages of investigating workflow managers/processing engines and I was thinking of integrating Apache Beam with Prefect. I joined slack to find out whether this is feasible and found you query upon searching for Apache Beam. I’m wondering if you’ve had any success in integrating these tools.
    b

    BK Lau

    1 year ago
    @Marz i didn't end up using Prefect; instead using Argo now since we have multiple language runtime requirements besides Python. The feature in Prefect that led me to consider it in first place was onprem/hybrid model. We uses container to encapsulate different runtimes.