Evgeniya Sukhodolskaya

03/04/2022, 3:46 PM
Hi, Prefect community! I am a data evangelist from - a crowdsourcing data labeling platform. I want to share with you our work on integration with Prefect which aims to help Big Data and Machine Learning engineers painlessly create data gathering & cleaning pipelines. Our engineering team created a toloka-prefect python package to orchestrate crowdsourcing pipelines in Prefect. Now, with this integration and due to Prefect failure management abilities, if you need to solve a task of collecting huge various amounts of data, or validate your existing dataset, you can accomplish it without headache related to loosing control over crowd. Let me continue in thread:) P.S. A question on my behalf: are there cases of using Prefect for creating Machine Learning pipelines?
🚀 7
💯 2
:upvote: 4
In Toloka, each labeling pipeline may consist of several projects created by requesters in which tasks of a particular nature are solved with the help of a diverse crowd from all over the world. Considering the light barrier to entry and since markup of each task is paid by a requester, any failure in the pipeline leads to money loss. Hence, such Prefect semantics as сaching and persisting data became a key to the vast improvement & budget preservation! We conducted a talk

Launching human-in-the-loop process on Toloka using Prefect

based on the popular example of a data-labeling task and want to share it with you. We are super happy to be part of a Prefect community and looking forward to deepening our collaboration:) If you have any questions or feedback regarding the integration, I will be happy to comment on them in the thread here. If you want to share your pain&ideas&proposals with our engineering team directly, you’re welcome to join our Toloka Global Community.

Anna Geller

03/04/2022, 4:17 PM
Hi @Evgeniya Sukhodolskaya, welcome to the community, great to have you with us! 👋 Thank you so much for contributing and this excellent notebook explaining how to use this integration with Prefect Cloud! 👏 I will cross-post it on Discourse and I'll make sure to recommend it to any users asking about data labeling use cases for ML. To answer your question: Prefect is a general-purpose workflow orchestration platform that supports basically all data-flow automation use cases you can think of, definitely including ML pipelines! Thanks again for sharing and have a wonderful weekend!
🙌 2
👀 1