I haven’t done much digging on the topic, but I’m ...
# prefect-community
d
I haven’t done much digging on the topic, but I’m sure I’m not the first to ask or think about a data transform registry. Basically there’d be some k:p pairs used to assemble a tree of business stakeholders and the upstream / downstream dependencies based on what’s in a properties file, or can be generated from the code similar to how reflection works. Any thoughts on this? Would be cool to see a table or visualization of it. Maybe some Prefect workflow tags can be used for this too?
k
You mean any metadata tag you can supply right? This was often requested so Prefect 2.0 lets you tag any entity so that you can filter the UI to see the runs with the tags you want
The Orion Dashboard is dynamic and allows you to save filters go you can make your own Dashboard. There is a GIF here.
d
I’m thinking more like a properties file, sorta like you find in CI/CD pipelines that take common attributes like git sha, package name etc to tag resources. Similarly, a properties file could be supplied for each module, and then fed to the registration so all the metadata is attached to the transform. Tags could achieve this, but a file could make it cleaner to either generate programmatically, or to keep the registration command from getting huge
Is Orion == Prefect 2.0?
And as an administrator, if I could look at for example a list of 100 transforms and then sort them by metadata that’d be handy for strategic decisions around either troubleshooting or more logistical business decisions. Treat the transforms like a business asset essentially, like we do the data.
k
Yes Orion is 2.0. I think you want something like Kedro ?
And this is doable in a sense, you can supply a config file to be read in as a Parameter to your Flow
You can use Prefect + Kedro (thought a bit of diminishing returns), but I think Kedro explicitly provides a configuration file for your, and then the administration filtering is addressed by the metadata tags in the UI since tags are on the Flow Run level and you can filter for the appropriate runs
m
Not sure if kedro can offer anything that Prefect can’t. Can you give a concrete toy example of what you would like to have?
💯 1
d
Hmm, the tree in Kedro is mechanical and focused on the implementation of the transform (useful). I’m thinking strictly metadata. Sorta like service registration and discovery, but purely for data transforms. I imagine a simple table but I’m sure there’s more interesting things to do with data about a transform. Some of the data in a view could be from a properties file, and some could be from the job configuration itself. Maybe “data transform catalogue” is an appropriate term. I like the idea of having as much of this kind of metadata being derived from code and tags as possible, so it doesn’t hve to be maintained or assembled post-hoc
m
I see! And where would you like to store that information?
d
It’d be cool if it were part of flow registration, that way CI/CD tooling could just pick up a properties file (however it’s generated doesn’t matter but you could require key:val pairs) and then POST it.
Some of it could be generative though, but there’d need to be some thought around data separation of concerns (compliance, privacy etc) as I don’t know what the metadata schema for a registered flow is
m
@Dylan that was actually not my question (sorry if that wasn't clear). My question was actually on where do you want to make this information visible? But I guess the answer is somewhere in the Prefect UI
d
Oh yeah, in the UI, or queryable from the graphQL api when describing flows (assuming there’s the Graphql equivalent of a GET flows RET array of json)
m
I have mixed feeling on such additions. My main concern being that the Prefect UI isn't very useful for non-technical users (who could potentially be interested in this info for e.g. building a report in Tableau or PowerBI)
k
Oh we actually have non technical users in the Prefect UI triggering flows and looking at artifacts. They treat Prefect like a report builder and then fill in the parameters and run the flow on demand. I think tags will be able to provide this flexibility on the UI side for ORion
👍 2
m
I also see potential for further extending Artifacts. Just like what can be generated with GE.
d
Yeah I think the ‘data as a primary asset’ crowd in business functions outside of engineering can definitely relate to that point Kevin
Maybe just a ‘--tags-from-file’ flag and a
path-to-file
passed to the registration command would do all of that just fine