https://prefect.io logo
Title
k

Kha Nguyen

07/21/2022, 3:25 PM
Hi, I am in the process of evaluating Prefect Cloud for my pipeline. I have a setup where there is a flow definition with parameters, and I have about 1000 parameter sets from database (and a lot more when the pipeline is live). What I can do is to deploy the flows with parameters for each parameter set via API. I want to know how this scenario can be handled: If I update the code of my flow, such as using a different algorithm, or add another step, how can 1000+ deployments be updated?
k

Kevin Kho

07/21/2022, 3:39 PM
Are you running these all as separate flows?
I am wondering if you can just fire all of these from a parent flow that loads the parameter sets and kicks off the deployments
k

Kha Nguyen

07/21/2022, 3:56 PM
I plan to use the Prefect scheduler to run these jobs
Yes, as separate flows
k

Kevin Kho

07/21/2022, 3:59 PM
What storage are you using? I am thinking you might not even need updating depending on your DeploymentSpec if you just point to the file
k

Kha Nguyen

07/21/2022, 4:00 PM
I think S3
Oh, I thought Prefect would do some pickling and store the flows somewhere on S3
k

Kevin Kho

07/21/2022, 4:02 PM
Ah for S3 yeah I think so it will make a copy but maybe you only need to update the file once?
k

Kha Nguyen

07/21/2022, 4:03 PM
I don’t understand the “only need to update the file once” part 🙂
Can I send PM?
k

Kevin Kho

07/21/2022, 4:05 PM
It would be more helpful for you to keep the discussion so someone else can pick it up from our side because tomorrow is my last day here 😅
k

Kha Nguyen

07/21/2022, 4:05 PM
Ah ok 😄
Well, thank you for your contributions. I have watched several videos of you on YouTube 🙂
k

Kevin Kho

07/21/2022, 4:06 PM
But I mean, let’s say you re-create 1000 deployments. Does it just fial?
k

Kha Nguyen

07/21/2022, 4:12 PM
So suppose I have this flow
@flow
def forecast(customer: str, dimensions: List[str], metrics: List[str]) -> pd.DataFrame:
    data = load_data_task()
    results = [forecast_task(df) for df in data]
    return results
Now, I have a web service that when I receive a new forecast setup, then I would invoke
deployment = Deployment(flow=forecast, params=params)
await deployment.create()
The deployment should be saved to S3. I have 1000+ deployments like this, and lets say I make changes to the
forecast
flow, or to the tasks, then how can this update be propagated? Considering that I am using Prefect Cloud.
Version 2.0
Yes, recreating is one option.
I will explore that approach