https://prefect.io logo
a

ash

06/18/2021, 12:08 PM
Hello everyone , I am a little confused on the below mentioned part
Copy code
import pymongo
import pandas
from reports.config import mongo_config
import sklearn
from prefect import Flow
from prefect.storage import Docker

@task
	def say_hello():
		print("hello world !!!")

with Flow("hello world") as flow:
	say_hello()


flow.storage = Docker(registry_url = , image name = "hello world flow")
flow.register("demo")
In above code I am importing three external libraries i.e pandas, sklearn, pymongo and
mongo_config
which here contains configuration related information for connecting with mongo
Copy code
When I register a flow, lets say for code above ,

A.) step(1) A docker image containing everything including external libraries, mongo config and flow code will be built and saved to container registry.
	step(2) Its Metadata including a schedule if any , its path to dependencies from container registry etc will be saved on postgres.
	step(3) When the kubernetes agent polls and have to run the above flow, it will create a pod and dependencies will be installed and after task completion pod is terminated.

	Thats my understanding of how things are working, please correct me if am wrong on any of above.
	Now one thing here is what if mongo config changes, whenever we built pipelines for reporting all we want to do is just change config at one place and  changes   are incorporated for every other report but going on with above approach , i might need to re-register every flow to let it engulf the updated config, Thats what i thinking over here, can you suggest someways of how can i change config at one place so that all the flows knows it and i don't have to re-register all my flows.

B.) One way that i think will be able to solve this is when I use github as storage as the code is read from github so the updated config will also be taken into consideration possibly but there is one issue in this approach ,
when the pod is created to run the script how the dependencies will be installed on the pod  since we don't have docker image this time?
m

Mariia Kerimova

06/18/2021, 1:31 PM
Hello! If you're using prefect image, you'll have to specify additional dependencies in your run config. You can find the example here. If you are using Kubernetes, yes, the flow will run in the pod, which by default will be deleted upon termination. I think you can use parameters for mongo_config. You can find info about using Parameters here.
k

Kevin Kho

06/18/2021, 1:39 PM
Hey @ash, there is this feature in Cloud called KV store where you can store key value pairs. You could store the database connection there (assuming nothing sensitive). For your use case you since you're on server, you can replicate this by storing it in something like Google Firestore where you just retrieve some key value pairs during runtime when you need it
a

ash

06/18/2021, 1:48 PM
Hey @Mariia Kerimova, if i use parameter, i will again have to specify defaults(i don't want to store sensitive value in scripts) for the flows that are scheduled and of course writing parameter for each flow would be very tough. Hey @Kevin Kho lets assume the credentials are sensitive since it can be anything let's say prod related credentials.
k

Kevin Kho

06/18/2021, 1:50 PM
Prefect Cloud has Secrets where you can store sensitive stuff and then retrieve them in your flow. For server, you can use environment variables and Secrets, otherwise you need to do something more involved like setting up your own secret store
Think like AWS Secrets Manager
a

ash

06/18/2021, 1:56 PM
ohh ok , let me give a read to prefect secrets once. This might possibly resolve my first problem.
Also whats your take on using github, like if my script on github is actually importing config from some parent directory , it would be taking updated ones here ofcourse, will that work?
k

Kevin Kho

06/18/2021, 1:59 PM
No it won't work because Github storage right now just downloads the file, not the whole repo so that file won't exist by default. You need to explicitly download it in your code. This might be easier if it's in something like AWS S3 or an equivalent.
a

ash

06/18/2021, 2:03 PM
Can you please explain how AWS S3 will help in this😅
k

Kevin Kho

06/18/2021, 2:06 PM
Actually ignore S3. Really the Google Firestore type of database might be better than S3
a

ash

06/18/2021, 2:26 PM
Ok one last thing Kevin, forget the credentials problem, If i want to use github , how will i create an environment for my pods, like docker images have both flow and environment but no such thing is there for github.
k

Kevin Kho

06/18/2021, 2:31 PM
Yeah then you need DockerStorage for that. You can’t use GithubStorage if you need to supply an image.
a

ash

06/18/2021, 2:33 PM
So what you mean here is If I want to run any flow on kubernetes with my prefect server on kubernetes, I cannot do it without using DockerStorage.
k

Kevin Kho

06/18/2021, 2:34 PM
Oh sorry! I believe you can use Github storage if you supply an image to KubernetesRun
Copy code
flow.run_config = KubernetesRun(image="example/image-name:with-tag")
🙏 1
And then the flow will be pulled from Github and run on this image.
a

ash

06/18/2021, 2:38 PM
one thing I am confused about is when I run a flow with storage as github and run_config as kubernetesRun, when the flow is actually run, will it connect with my machine to get the image or will it store on container registry?
k

Kevin Kho

06/18/2021, 2:39 PM
It will look locally for that image and if it doesn’t exist, it will try to pull from the registry and if it can’t find it, it wont work.
a

ash

06/18/2021, 2:43 PM
Is there a source where i can read more about
Copy code
It will look locally for that image and if it doesn't exist, it will try to pull from the registry and if it can't find it, it wont work.
Thanks a lot Kevin🙌😊
k

Kevin Kho

06/18/2021, 2:48 PM
Where is describes that behavior for DockerStorage. I am not 100% sure on this behavior for KubernetesRun, but I think it will be the same
a

ash

06/18/2021, 3:10 PM
ok great, I will read this again. Thank you.