Bernardo Galvao

04/29/2022, 10:02 AM
Hi, I would like to fish for a conceptual clarification and best practices around CI/CD in ML. There seems to me that there is a functional overlap between GitLab CI/CD and Prefect; and I have to conceptualize some sort of Continuous Integration and Continuous Delivery for machine learning, which I could put into Prefect dataflows. • As I understand it, data is better passed between Prefect tasks. ◦ This would make Prefect a better candidate for running data and model validation tests. • GitLab CI/CD is designed to test code. ◦ I am not sure if I should use it to run data and model validation tests. ◦ I think it has its place in integrating and delivering Prefect code. I am slightly confused whether 1. GitLab CI/CD would end up testing the same things as Prefect would at some point 2. I can do without GitLab CI/CD It is not clear how to use one or the other specifically.

Anna Geller

04/29/2022, 11:37 AM
Good question! In fact, there are some customers using Prefect to build CI/CD pipelines. If you think about it - CI/CD is also a workflow (even a DAG!). So building a CI DAG to deploy a Prefect flow (which imagines triggers a dbt DAG) is super meta -- you build a DAG to deploy a DAG running another DAG! 😁
🥳 1
😆 1
I would approach it this way: • you can build your actual workflow including your business logic and your data + model validation tests using Prefect, • the part to package code dependencies into a Python package and/or a Docker image can be performed from your CI If you need some examples about that, check this Discourse tag This part of dbt blog post also shows how you could build a super simple one with CircleCI
👍 1


04/29/2022, 11:52 AM
I completely agree with Anna here. Typically, you would use GitLab CI/CD to run unit/integration test, package your code (CI part) and deploy your flow in a staging/prod env (CD part). You would then use Prefect for your data flow (incl. model building and data/model validation test)
💯 2
But, there is no right or wrong here. Some people also use GitLab CI/CD to trigger an ML training pipeline (which is also a data workflow).

Bernardo Galvao

04/29/2022, 12:34 PM
thanks both!
👍 2