https://prefect.io logo
Title
w

Walter Gillett

12/09/2019, 10:49 PM
Having read through the Prefect docs, I love the lightweight philosophy, Python-friendliness, and improvements over Airflow. But the documentation is lacking, are there plans to expand it? For our computationally heavy use case, we would want each task to run in its own Docker container and independently on a collection of worker nodes (e.g., as a k8s job). The documentation doesn't address this common use case. There is barebones k8s API reference but no conceptual material or examples. The closest thing I can find is https://docs.prefect.io/core/tutorials/dask-cluster.html which says "Take this example to the next level by storing your flow in a Docker container and deploying it with Dask on Kubernetes using the excellent dask-kubernetes project! Details are left as an exercise to the reader. 😉" Ideally this exercise would not be left to the reader. But beyond that, we don't want to store the entire flow in a single Docker container, rather each task gets its own Docker container since each task has different computation requirements (CPU-heavy vs. RAM-heavy vs. i/o-heavy vs. needing access to a large reference DB vs. ...), also parallel tasks should be able to run on different workers. Please advise. Also: Prefect Cloud sounds appealing as a persistence solution, but do we have the safety net of being able to implement our own persistence - are there API hooks to support that?
c

Chris White

12/09/2019, 11:19 PM
Hi @Walter Gillett - thanks for the feedback! We are always attempting to expand our documentation. To address your specific question, while running individual tasks within their own docker container might be a popular use case for CI systems and pure k8s managers, for Prefect this isn’t as common as you suggest. To achieve this, I’d recommend using one of our many Kubernetes Tasks from our Task Library (and of course you’re welcome to tweak them for your own needs): https://docs.prefect.io/api/unreleased/tasks/kubernetes.html#createnamespacedjob We do have per-task environments on our roadmap for next year though, so you can expect to see further developments on this front in the future. For your question about hooking in your own persistence solution: Prefect Cloud orchestrates Prefect Core through it’s many hooks, so it’s definitely exposed in a way that you could write a custom persistence layer if you’d like!
w

Walter Gillett

12/09/2019, 11:20 PM
Thanks @Chris White that's helpful
👍 1