https://prefect.io logo
Title
s

Sam Werbalowsky

09/13/2021, 3:29 PM
Looking for any resources regarding development lifecycles between local, dev, and prod environments for data engineering projects - specifically using Dask/Prefect, but that’s probably too niche so other resources are welcome!
k

Kevin Kho

09/13/2021, 3:30 PM
I would love to see more, but the only thing in this topic I’ve seen so far is this
👌 1
👍 1
It is quite niche though since it depends on the CI/CD and the Storage you use (Github? Docker?). Some people use the CI/CD to register their flows in different environments. On the Prefect side, you can use different projects to host the different environments. If you are on enterprise, this separation can be provided on the tenant level.
s

Sam Werbalowsky

09/13/2021, 3:47 PM
Awesome - thank you! Yes - we are planning on Kubernetes deployments of Prefect Server, and Dask Gateway, with Gitlab storage. Initially thought we would have multiple deployments of Prefect Server rather than projects - one for development environments and one for production, and using different IAM roles for each to control what is accessed and where things are run. Something like running prefect flows locally via python (
python myflow.py
) with a dedicated dask cluster for this purpose, then running in development, then finally pushing to prod. Things are a little tricky with the local step, because we may have a use case for users playing with sensitive data at that stage, which is a whole other animal.
👀 2
k

Kevin Kho

09/13/2021, 3:57 PM
Ah i see, but even if you had multiple servers, it still seems the sensitive data being exposed would still be an issue right? The setup to push to prod looks good though
💯 1
z

Zanie

09/13/2021, 4:31 PM