Hi Experts, We are evaluating Prefect and new to P...
# ask-community
s
Hi Experts, We are evaluating Prefect and new to Prefect as well. 3 questions: 1) Do Prefect has any support for incremental builds and pipeline testing using python testing frameworks. 2) We checked that Prefect support Dask for distributed computing.Is there any support for pyspark (both Databricks and non Databricks)? 3) Can Prefect be used with ML deployment libraries like Seldon core to deploy on Kubernetes.Basically, looking for deploying trained model on Kubernetes cluster for prediction. Thanks, Snehotosh
k
Hi @Snehotosh, for 1, what testing framework are you using? For 2, Prefect has native support for Dask, meaning that Prefect tasks can be mapped in parallel by using Dask as an engine. There is no Spark engine at the moment for Prefect to run on, but Prefect can definitely spin up Databricks and Spark jobs.
On three, Prefect can be used to orchestrate that Seldon deployment to Kubernetes but won’t be managing that endpoint after deployment.
I have not used Seldon, but I have seen people push models with MLFlow
s
@Kevin Kho, we use pytest mostly for unit testing. I have seen blog on Databricks (Databricks RunSubmit) but not sure on spinning up Spark job. Can you provide some pointers so that we can quickly do some POC? Yes, we also use MLflow extensively and planning to use Seldon primary for drift monitoring. Client wants to deploy models on top of Kubernetes.
k
So because Prefect is Python code, you should be able to test it with pytest (I’ve seen some users do it). We have the Databricks tasks in the task library like you pointed out. If you’re using the
spark-submit
CLI (I’ve only used Databricks), then you can use the
ShellTask
. Prefect is meant for batch jobs so it would be used to deploy Seldon, but not monitor (I guess that would be on the Seldon side)