hi all, maybe a newbie Orion question - how can I ...
# prefect-community
s
hi all, maybe a newbie Orion question - how can I use more than one Python file to describe a flow with multiple tasks (we've got tons of internal ML libraries)? apart from putting all those in a Docker image, I haven't found a way for now
a
We generally encourage incremental adoption so you can certainly start by e.g. executing those custom scripts via a
ShellTask
or via a Docker task, and as you adopt Prefect more you can slowly migrate those pipelines to Prefect - check out Laura's tutorial on that subject

https://www.youtube.com/watch?v=kH3hPVwFfiA

I saw many users who have their tasks specified in Python modules that get imported within the flow file, but then you need to make sure to package that properly so that at execution time Prefect can import those custom modules. Check out this post describing how you can do that https://discourse.prefect.io/t/the-simple-guide-to-productionizing-data-workflows-with-docker-by-kevin-kho/453 What agent type do you use?
s
Thanks a lot for the suggestion! In fact, we have already got multiple Dockerfiles for the existing pipeline (based on another stack), so that won't be an issue. Our ML libraries are properly packaged and served from an internal PyPi server, so a Docker-based approach is a proven way to put the dependencies inside a container.
we're at a research phase (could take a month or so), would you recommend starting straight from Orion, skipping Prefect 1.0?
a
so a Docker-based approach is a proven way to put the dependencies inside a container
That's correct!
we're at a research phase (could take a month or so), would you recommend starting straight from Orion, skipping Prefect 1.0?
It's totally up to you, you can certainly start directly with Orion, and in fact, this would be quite beneficial for us since you could provide us (incredibly valuable) feedback as an early adopter. But 2.0 is not yet fully production-ready and it may take a couple of months to bring 2.0 to that stage - check out our latest announcement for more details on that https://discourse.prefect.io/t/the-second-launch-week-initiative-announcing-prefect-2-0/499. I would honestly encourage you to still explore Prefect 1.x, sign up for a free tier of Prefect Cloud and put some pipelines to production already to get a feeling of how it is to work with Prefect in general. The user experience shouldn't change - you can still run your workflows locally first by adding a couple of decorators to your workflow. Then you can build/deploy your flows - in 1.0 this step is called flow registration, in 2.0 it's creating a deployment. Having used and understood 1.0 first will likely make you appreciate many Orion features more 🙂
👍 2
s
sounds like a plan, thanks! is there a way in 1.0 (or 2.0) to combine using multiple Docker images for subflows/tasks in a single flow?
a
Yes, there is! Check out this Discourse topic with a deep dive on this: https://discourse.prefect.io/t/can-prefect-run-each-task-in-a-different-docker-container/434
👍 1
s
thanks a lot, seems that's enough for now, so diving into the above
m
Reading up on this thread confuses me with Orion. Is there a way to replicate what is done here https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows_no_build/docker_script_kubernetes_run_custom_ecr_image.py? This way, we can simply package up a flow (using Orion) and make a deployment so that it can run on k8s with a custom container (that also contains the flow code itself). Reading through the Orion docs, it seems that this is not possible because we don't have Docker storage yet, or am I missing something?
a
You're right, there is no Docker storage yet and we are working on various flow deployment patterns and ways to package your flows for Prefect 2.0, including packaging your code into Docker images. But nothing concrete yet, you can follow the
release-notes
tag on discourse to get an email update once we release that 🙂
m
So what is the recommended way to run a flow requiring additional packages on k8s with Orion? For example to run dbt?
a
You would need to build a docker image yourself and provide this image on your
KubernetesFlowRunner
or
DockerFlowRunner
👍 1
m
And can you then also specify the location of your flow code inside the image?
a
Good question, right now this doesn't seem to be supported (I think) because we mainly support either local or cloud object storage (S3, GCS, Azure) - but maybe you can test if it works for you when you bake the local flow file into the image and reference the location available within the image on the
DeploymentSpec
👍 1
m
Thanks @Anna Geller will try that and get back to you once I have some time left.
👍 1