05/22/2023, 8:51 AM
Hi all 🙂 Our current system uses FastAPI to handle requests, which trigger tasks in Celery. These tasks are placed in a queue in RabbitMQ, and Celery workers process them, performing ML computations and providing results. We use callbacks and monitoring to track the progress of these ML tasks. In addition, we have Spark Scala tasks that require a lot of computing power and run in a distributed cluster for data processing. Unlike Celery, Spark has its own methods for scheduling jobs and processing data. So, it’s important to manage resources carefully to avoid conflicts and performance issues between Spark Scala and Celery. One challenge we face is that Spark Scala and Celery tasks have different coordination and orchestration needs. Spark has its own built-in mechanisms, while Celery relies on RabbitMQ and worker pools. To address this issue, we want to try using Prefect. Prefect as a tool for managing workflows, which includes defining dependencies or scheduling. By using Prefect’s features, we hope to be able to orchestrate both Spark Scala and other tasks within a single workflow, simplifying the definition and management of our workflows. Do you have any feedback on such a goal? I saw this discussion. Also, we would like to know how difficult it would be to modify parts of our code to work with Prefect’s APIs and conventions vs our current task definitions. Saw this feedback from Jellyfish