Valentin Baert

    Valentin Baert

    4 months ago
    Hi, I'm evaluating several tools to help build a team that deals with data integration. In the toolbox, we will probably have prefect orion I have conducted a POC where I deploy a flow on a Google Kubernetes Engine cluster which uses a Google Cloud bucket as the storage and connects to the Prefect Cloud Orion API. The flow then consumes a kafka topic hosted by Confluent Cloud and for each kafka message starts a
    @task
    to process the message (just logging for this POC). I have written the steps for my POC here if it might help other people : https://gitlab.com/idkw/prefect-orion-gke-poc
    Anna Geller

    Anna Geller

    4 months ago
    This is a fantastic resource, can't wait to see how you progress with the real-time streaming use case - excited to assist you along the way! 👏
    Valentin Baert

    Valentin Baert

    4 months ago
    Thanks, in our other thread I was wondering what should I do about this issue : https://prefect-community.slack.com/archives/CL09KU1K7/p1652949937007829?thread_ts=1651566181.776459&cid=CL09KU1K7 I don't kow if I should have a @flow wrapping the kafka infinite polling loop and then run a task for every individual message (but then how do I cleanly stop the infinite flow when I need to update the code ?) or whether there is a better way to handle event streaming. I was under the impression prefect 2.0 was brining new answers to this but I'm a bit lost here.
    Should I use a sub-flow or is there a better way ?
    Anna Geller

    Anna Geller

    4 months ago
    let's continue the discussion in our previous thread, this one is only for sharing blog posts etc - but all great questions
    Valentin Baert

    Valentin Baert

    4 months ago
    ok
    Kevin Kho

    Kevin Kho

    4 months ago
    That’s nice! Good to see Kafka + Prefect, which was awkward in Prefect 1.0
    Valentin Baert

    Valentin Baert

    4 months ago
    I have added an additional sample
    prefect_2_kafka_kub_no_deployment.py
    which is a long running kafka consumer that starts a flow when receiving a kafka message but does not require a deployment nor an agent. Based on the advices given by Anna. I think it's more suitable to my streaming use-case.
    Anna Geller

    Anna Geller

    4 months ago
    nice work! 🙌