https://prefect.io logo
Title
a

Aiden Price

09/21/2019, 11:32 PM
Hi folks, what is the best way to reference and mutate a variable between `flow.run()`s? Should it be a
Parameter
? My actual use case is to have a dict which is a copy of one of my database tables which the incoming data needs to refer to to find its foreign key each time. If I find a new name that I don’t have in my dict I’ll need to update the table in the database and mutate my dictionary, then reference the mutated version in subsequent `flow.run()`s. I’m only new to Prefect but I have to say I love your work, thank you!
👋 2
j

Jeremiah

09/21/2019, 11:34 PM
Hey @Aiden Price — a
Parameter
is probably the way to go. You can provide a different value for each run of the flow, and then work with it from other tasks.
Alternatively, you could have an actual task that loads your database data at the beginning of each run, and another task at the end that updates the database if necessary. That way you wouldn’t have to provide any data at all. This would be more compatible with a regularly-scheduled flow, as flows that run on schedules can’t really accept parameters (for practical reasons: there’s not a good way to provide them)
a

Aiden Price

09/21/2019, 11:37 PM
And it should be safe if I mutate a Parameter within the flow? Say I have a task that changes the dictionary then a second task that reads it within the same Flow?
I’m trying to avoid reading the database table every time, but that’s probably premature optimisation.
j

Jeremiah

09/21/2019, 11:38 PM
Prefect has a
cache
mechanism that could be useful here (it will cache a task until the cache is invalidated), but it requires some careful state management
Mutating a
Parameter
might not do what you expect — if you change the value of the parameter, other tasks won’t pick it up. (They might if you’re working in a purely local context, but I wouldn’t count on it)
c

Chris White

09/21/2019, 11:38 PM
If you need to manipulate data mid-run, you might consider actually exchanging the data between the tasks; so one task reads in the Parameter, changes it, and then returns the altered dictionary for the next task to ingest
j

Jeremiah

09/21/2019, 11:39 PM
However you could have a task that changes the parameter and then have other tasks depend on THAT one
^^ what @Chris White said!
a

Aiden Price

09/21/2019, 11:45 PM
Ha, thank you both, I think that might be the best approach (because I’ve just realised that there’s an HTTP call to get new data in this mutation workflow too). In order to reference the new version of the dict in subsequent flow runs should I return the table from the last task and replace my global dict with the flow.result?
c

Chris White

09/21/2019, 11:47 PM
Hmmm that’s tricky - if you’re maintaining a dictionary that is updated with each run you might want to persist it somewhere (local filesytem / s3 bucket / etc.) and have your first task read the dictionary in
a

Aiden Price

09/21/2019, 11:55 PM
Okay that’s a good idea, I suppose the best place to persist it in that case though would be the database itself... hmmm