Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi everyone, I am an NLP scientist at <https://www.enterpret.com/|Enterpret>. Currently we are looking to scale our processes. We are looking for tools that abstract Infra and Compute management. We are evaluating based on the following points:
• There are multiple tasks in a single flow.
• Data needs to be transferred from one task to another task.
• Data that needs to be transferred can be huge. So persistance of data at each step of flow (and / or) at the end of the flow is needed.
• Should be able to run each task locally manually (For experimentation / debugging purposes) and on cloud (for scaling purposes)
• Should be able to run tasks parallely that are independent to each other in a flow.
• Should be able to run flows parallely -&gt; Could be useful for running different experiments at same time
• Having the ability to monitor the progress of task in a flow, since tasks can take more time
• Compute needed for each task in a flow can be different. Ex:Training needs GPU, Data Gathering can work on CPU
• Compute used should scale down when no flow is running.
• Each step in a flow can have different dependencies.
• Learning curve should be less so that data scientists can feel less overwhelmed.
I have evaluated Metaflow. But configuring it took a lot of time. Packaging of custom built code is difficult.
I was evaluating AWS Sagemaker but the complexity is too high. Can some one help me in understanding the differences between Prefect and Sagemaker. Pros and Cons of both so that I can explain to my team.
Would Prefect covers my usecases?
I also write blogs on MLOps. Interested in contributing as well. <https://ravirajag.dev/>

Welcome to the community, Raviraja! :wave: Prefect is definitely the right tool for all the use cases you've described.

Sagemaker is primarily a managed service to run ML notebooks and store/serve ML models. In contrast, you can think of Prefect more as a general-purpose workflow orchestrator that allows you to build, run and operationalize your data workflows, regardless of whether those are ML flows, data engineering flows, or simply flows automating some processes. And Prefect is not tied to AWS. Sagemaker is purely focused on ML use cases and has a much more narrow focus.

I'd recommend checking the Getting-Started resources <https://discourse.prefect.io/t/getting-started-with-prefect/27|here>. You may also check <https://discourse.prefect.io/t/should-i-start-with-prefect-2-0-orion-skipping-prefect-1-0/544|this topic> about the differences between Prefect 1.0 and 2.0. If you then have any specific questions, feel free to ask those in the <#CL09KU1K7|> channel or on Discourse.

Hey <@U02DCTGLGTG>, Sagemaker is indeed more geared towards ML while Prefect is more general purpose. It may not even be an either-or. I saw a blog of something using Prefect to orchestrate Metaflow and I feel the same can be done for Sagemaker.

Sagemaker is really a managed service that makes it easy to deploy models because you can use the same container for training (and distributed training) also.

You have to “bring your own compute” with Prefect though as we don’t host any compute infrastructure for you so even if Prefect does support execution on GPUs, you would need to provide that. So in that sense, you could just use Prefect to trigger Sagemaker jobs

welcome to the community <@U02DCTGLGTG> :wave:

:wave: Welcome to the community <@U02DCTGLGTG>!