Pardon me if this sounds un-educated I am working...
# ask-community
t
Pardon me if this sounds un-educated I am working for a healthcare setup and we are currently using SSIS ( I know it sucks ) along with SQL Server, Oracle and ton of other sources, almost everything is run using SSIS packages with the exception of a few C# and Python pipelines, as of late, we are running into various constraints and would like to move one of our primary pipelines to a different system, I along with a colleague proposed Prefect after learning Prefect through last year's DataTalks's DE Zoomcamp, we are right now in the phase of doing a simple Prefect Pipeline POC ( not much of ETL but testing of one of simpler pipelines involving execution of simple SQL logic ) and would like community/staff's honest opinion as well as help about a couple of things i.e. 1. Most of our system is on-prem and the entirety of data we are dealing with is on those on-prem systems, is Prefect useful for our use case since there would be minimal to no cloud usage and everything would revolve around SQL server, would we be able to leverage Prefect to its fullest? 2. Moreover, since we will not be switching from one of the modern tech stacks and we will be moving from SSIS instead, a few things don't necessarily have a 1-1 correspondence with how SSIS or SQL Server deployments work, and would really appreciate if anyone can guide/explain the concept of Work Pools and a queue, and what best would describe them in our context? 3. We have SQL and Application Servers (running maintained SQL Agent jobs) for each environment i.e. Dev, QA and Prod would have one SQL server which holds data and one application server which provides resources for all ETL, so in the case of Prefect, how would this work? Do we get to run a prefect server on each Application server the way we are currently doing or would it be different? I know all these questions are kind of noob questions, but as I said, I don't have a solid background and would really appreciate the help.
👀 1
k
Hi Tony! Never worry about asking questions here. Curiosity puts you on the path toward building beautiful things 😄 Now, for your questions: 1. Prefect is just as useful for on-prem orchestration as it is in the cloud. In both scenarios, you're the one choosing what runs, and how, when, and where it runs. As long as you have compute resources available and python code to execute, Prefect is a great way to make that happen. In fact, Prefect Cloud is useful for orchestrating on-prem work too, since the worker process that kicks off flow runs in your environment only sends outbound requests checking for work to do. That means one less thing to host on a server. As for interacting with SQL sever, you can use prefect-sqlalchemy in conjunction with a SQLAlchemy Connector block for that. 2. A work pool is a tool for associating the orchestration of your pipeline runs with the execution environment you want to run them in. Work pools provide a default template for starting work on the type of infrastructure they're associated with, and the connected worker listens for work in that pool and starts the infrastructure that will execute the flow when a scheduled run is found. If you had an on-prem kubernetes cluster, you'd start a kubernetes worker in that cluster and connect it to a kubernetes work pool to listen for scheduled flow runs. Work queues within a pool are a way to manage priority and flow run concurrency on a finer level if that's something you need. 3. The best method I can think of for managing multiple environments like you described is setting up multiple workspaces in Prefect Cloud. You could write one flow that utilizes a SQLAlchemy Connector block that exists with the same name in each workspace, and deploy that same flow in each workspace. As a result, you've got the ability to schedule or start ad-hoc pipeline runs against each environment with the exact same code, which can be observed independently in each workspace.
t
Thank you very much Kevin! 🙂 This is exactly the help I needed.
🙌 1