Hello raising hand Today i introduced our data scientists to Prefect Community #ask-community

Hello :raising_hand: Today i introduced our data-s...

Tom Klein

07/25/2022, 8:51 PM

Hello 🙋 Today i introduced our data-scientists to Prefect, and their main question was - what would be the easiest way for them to run their custom code (with all its various dependencies) from the flows they create Is containerizing their code in a docker image and pushing it to a remote repository (like ECR) the only reasonable way to achieve that? or could we conceivably just fetch the code from Github, and install the necessary dependencies as actual tasks in the flow, before executing it? (obviously it adds complexity and run-time to the flow, but are there any other reasons this might not work or be a bad idea?) (the main reason dockerizing and pushing to ECR is inconvenient for them is because they can't do it themselves, and we don't have a lot of DevOps resources that can easily provide them with a complete CI/CD pipeline for every new project they wanna test out)

✅ 1

Jason

07/25/2022, 9:29 PM

Couldn't they run it locally via prefect run -n, and the Docker container and flow is build against the local Docker engine (https://docs.prefect.io/orchestration/recipes/configuring_storage.html). We use ECR but when it's successfully reviewed and merged via the CI/CD. All local testing uses Docker against dev and staging warehouses, which are flipped with environs that the data scientists and engineers have in their ~/.profile.

Jason

07/25/2022, 9:31 PM

We also use

pipenv

to deploy dependencies within the Docker containers, because it facilitates local testing & dev outside of flows with any common modules, which may be a familiar workflow for them

Tom Klein

07/25/2022, 9:39 PM

@Jason if they run it locally then how exactly would it run on schedule?

Tom Klein

07/25/2022, 9:42 PM

sorry, but i'm not fully sure i understand what's your use-case. i don't want to complicate their work process by telling them to package things themselves into a docker or run prefect locally i want them to just focus on writing flows - and have those flows reference their code (in the easiest way possible the requires the least DevOps resources). that's all Our ECR is tightly controlled by our devops and we have a CI/CD system that is responsible for taking Github repos and turning them into stored Docker images --- we just don't want to involve DevOps in every side-project that the data-scientists are trying to test out on Prefect

Tom Klein

07/25/2022, 9:50 PM

for example, today - they have some models that run as python script on an EC2 machine (that has all the necessary dependencies installed), scheduled by cron obviously it would be much better if they could migrate those to Prefect The question if the only viable way is for us to ask DevOps to build a CI/CD pipeline for each one of those Github repos (one repo per project , one model per project) every time they wanna run something there

Tom Thurstan

07/26/2022, 8:09 AM

Hi Tom, I am trying to solve the same problem. I have a repo for prefect itself and then repos for the DS projects. Each have virtual environments setup on the machine. I have simple flows setup that either 1. Add the other repo to the path and then import functions from the DS repo and run (you can set the python environment to use in the deployment) 2. Use subprocess run to run a command line call for scripts (specifying the correct python environment in the call) Our repos push to that machine whenever there is a commit into master

Tom Thurstan

07/26/2022, 8:10 AM

This is just how I have got it working though so open to improvements!

Tom Klein

07/26/2022, 8:23 AM

Hey other Tom 😁 can you clarify what you mean by "the machine"? What machine?

Tom Thurstan

07/26/2022, 8:23 AM

sorry, just a remote server 🙂

Tom Klein

07/26/2022, 8:24 AM

On which you have a local prefect agent running?

Tom Klein

07/26/2022, 8:25 AM

Sounds reasonable as an easy hack, but we want our resources running on k8s to also be able to scale out or run arbitrarily many processes simultaneously

Tom Thurstan

07/26/2022, 8:26 AM

Yeah exactly, I think you can do the same thing with K8s though?

Tom Klein

07/26/2022, 8:32 AM

@Tom Thurstan in k8s there is no “the machine”. there are ad-hoc

k8s jobs

that prefect creates per flow (which then disappear into non-existence once finished). So they must either already have the necessary code & dep’s on their designated Docker container (only possible if using a custom Docker image), or they must get them after the job starts --- and this is what i’m trying to ask about.

Tom Thurstan

07/26/2022, 8:36 AM

Ah I see, sorry my misunderstanding, if you can’t tell I’m working on this from the DS side 😂

🙏 1

Tom Klein

07/26/2022, 8:38 AM

no worries, these types of complications & confusions is exactly what i’m trying to save my DS’s from having to deal with, so they can focus on actual DS work rather than on building Dockers and deploying them to ECR

4 Views

Open in Slack

Previous Next