https://prefect.io logo
j

Jonas Sølvsteen

02/16/2021, 9:13 AM
Dear Prefect people, can you please help me find out how to make Prefect (Cloud?) work for my use case? 🧑‍💻 We want to process thousands of satellite images 🛰️🌍 on a Kubernetes cluster, scaling to ~100 TB of RAM and thousands of jobs running in parallel. We already successfully did this with our own little event-based workflow manager (REST APIs, Postgres, Docker, Python K8s API), but it is lacking a lot of the bells and whistles that Prefect has, choking a bit on all the messages, and we would rather like to join the awesome community you have here! ❤️ Using your nice docs and the open source code, we already ran a single processing job on our K8s cluster via a Prefect flow 👍. Now we want to scale this up. Intuitively, I would have one flow process a single image (using 3-8 K8s tasks, ~10 min per task). But we want to scale to 1000s of images getting processed in parallel. Seeing that Prefect Cloud Teams plan only allows for 2️⃣(!) concurrent flows, I get in doubt whether Prefect is built for this number of concurrent flows and all the logs and messages generated from that. Can you please let me know whether Prefect is meant to scale to 10ths of 1000s of tasks running in parallel? And if so, whether I can run 1000 flows concurrently or should rather have one flow branch out into 1000 tasks with each their subtasks (if that is possible)? Please note that this is burst compute once a month or so, with lower (10s of images) daily loads. I am happy to work with Prefect Core, but Cloud might also be very nice if we can afford it.
a

Alex Kerney

02/16/2021, 1:54 PM
You might also want to chat with the Pangeo-Forge project. They are trying to replicate the success of Conda-Forge for producing analysis ready, cloud optimized data. While they are still pretty early in their development, I believe they are using Prefect for some rather large flows already (40 + years of daily global sea surface temperature data? why not!).
👍 1
💡 1
j

Jonas Sølvsteen

02/16/2021, 1:56 PM
Nice hint, thanks! I have heard about that initiative but was not aware they are using Prefect.
a

Alex Kerney

02/16/2021, 1:58 PM
I think you have to really dig into the forks of their repos or the meetings to know whats currently going on. It seems like they are playing way faster than they are documenting right now.
👍 1
j

Jenny

02/16/2021, 2:02 PM
Hi @Jonas Sølvsteen - welcome! We're actually planning some changes to our pricing model that should answer your questions: https://medium.com/the-prefect-blog/liftoff-fe5a117a94ff#2649 😁
❤️ 1
💯 1
z

Zanie

02/16/2021, 4:19 PM
Also a note on scale -- Prefect Cloud can definitely handle high levels of parallelism as you've mentioned. Our customers love to map over very large arrays 🙂
❤️ 1