Hi I am trying to create a big system a part of i...
# show-us-what-you-got
r
Hi I am trying to create a big system a part of it is that I want the system to be able to work with python scriptable plugins what I want is for clients of my system to be able to submit their git repository containing their python script And then I want to be able to schedule each of those scripts to run either periodically or manually I also don't want a case where the execution of one script hurts the execution of another script system wise and resource wise Is there a way to accomplish this with the open source perfect.io lib? I know its a big question but I would be really greatful if you could help me Also if you have any suggestions for what is the best architecture for a system like this I will be more than happy to hear from you😁
a
I think this is less of a prefect specific question than a “how do I allow safe execution of third party code in an isolated way” type of question. Since prefect is just python, if you accomplish the latter, you coincidentally accomplish the former. So as a word of fair warning, I have zero experience with this, but have some initial thoughts. First, you need to be able to make sure your execution environments are isolated. Even if you have 2 users running small jobs (like won’t impact performance of each other), they shouldn’t have access to the same file system, shouldn’t share the same dependencies, etc. You’re likely going to accomplish this with a non-python specific tool, possibly something like Kubernetes that can both enforce isolation (in its own container) and you can coincidentally also set the amount of available resources. You’ll also want a way to count resource utilization of your computing environment by user. The portion that I know I don’t know enough about and don’t want to give advice on is security of the whole thing. You have to worry about running python in a safe way so that users can’t access root on your VMs, among other things. You need to take into consideration of other random things that occur when you’re providing a place to execute other people’s code, like how 3rd party APIs sometimes block IP addresses if they see too many requests coming in at once (thinking its ddos traffic as an example).
upvote 1
r
Thank you for the reply😁 Would you still use perfect with kubernetes to achieve the script running itself?
z
I'm going to second what @Alex Cano said, and add a bit of Prefect context here as well. Boiling down your requirements, it sounds like you have scripts that need to be run on a schedule. Prefect is great at that! And if you want them to be built from specific repos, I think your best bet is including a step in your CI/CD process that builds/registers your flows with Prefect Server at the appropriate moments. Once registered, those flows can be run on a schedule or manually as you'd like. I'll echo that you'll need to be careful, security-wise. Prefect Server doesn't include an auth component, so you'll want to make sure that you're not unintentionally exposing an unauthenticated endpoint to the internet. Depending on your setup, that may not be a concern, but definitely worth keeping an eye out for.
r
Hi, What do you mean by exposing unauthenticated endpoint to the Internet? Also lets say I want the client to be able to just ask to publish his code and the system would generate a git repo for him and automatically configure the CICD to use perfect. Is that possible? Also in the case of scaling, where would you run each script? On the same machine? A docker?
And thanks for the answer😁
z
Hi Roy, good questions. When you spin up Prefect Server, you spin up an API that manages orchestration of your runs. By default you should be fine, but if you do anything to customize this behavior, it's worth keeping an eye out to ensure you do so in a safe manner. 🙂 For the CI/CD case, I think it'd look a little something like this: • your user pushes their code to git • your CI/CD setup notices this and registers the flow to Prefect Server • depending on how the flow is registered, it either runs on a schedule or is available for manual runs Does that fit the use case you're describing? As for scaling, it really depends on how demanding your flows are. I can easily run several lightweight flows at the same time on my laptop, but we also have folks who require much heavier resources. I think you mentioned Kubernetes earlier in the thread-- that might strike a good balance for you.
r
Wow you guys are amazing thanks for all your answers! Is it possible when the CICD registers the flow to automatically push it to Kubernetes using perfect?
z
The pattern I think you're getting at is possible, but it might be worth clarifying some of the architecture here. Prefect Server maintains a queue of scheduled runs for your agent to pick up. The agent (in this case running on Kubernetes) then queries that queue every few seconds, and deploys jobs if it finds any in the queue. So what you'd see is 1) your flow is registered 2) if it's registered with an active schedule, it'll enter the queue 3) the agent picks up the flow run and deploys it on your Kubernetes cluster. So in this case, your flow would be "pushed" to Server when it's registered and then "pulled" into Kubernetes by the agent.
r
Oh I see thanks😁