Ttbomk there is not at the moment, what problem are you trying to solve with Open Lineage? For transparency, I'm not a big fan of this project but I'm very interested in the underlying problems you would like to address
r
Ricardo Gaspar
11/09/2022, 10:57 AM
I’m interested on using Marquez to get dataset lineage and metadata management-discovery.
Would you suggest open metadata instead? I don’t know if it integrates with Spark (scala)
a
Anna Geller
11/09/2022, 12:42 PM
Marquez doesn't give any advantage to workflow metadata that Prefect already provides in the UI. But if you are interested in metadata of your data rather than your workflow, then there are many great tools you could explore including OpenMetadata, DataHub, Atlan, Stemma and tens if not hundreds other metadata tools
🙏 1
😉 1
r
Ricardo Gaspar
02/10/2023, 5:57 PM
Just revisiting this. I like Open Metadata, seems very interesting; didn’t play with it yet. But when it gets to a major release it seems that it will be a in a better stage.
Answering your question, what I’d like is to have a single tool/framework that would be able to get data lineage from spark (ideally column level) as well as from the orchestration tool (prefect in this case; airflow has some integration with OpenLineage and Marquez).
a
Anna Geller
02/10/2023, 5:59 PM
there are like millions of lineage tools on the market, but the problem is more that everyone has a different understanding of what lineage really is
Anna Geller
02/10/2023, 6:01 PM
if you are on Prefect Cloud, you'll be able to do a lot with Automations in a more actionable way than most data catalogs offer, but if you need a data catalog, you'd need to do a PoC for that specific tool, afaik no lineage tool integrates Prefect workflow metadata yet, but you don't necessarily have to, it all depends on what you're trying to accomplish
😉 1
🙏 1
🙏 1
b
Brad
02/15/2023, 10:59 PM
hey @Ricardo Gaspar - I'm still interested in this space, and Open Metadata looks pretty interesting. I might and have a play around with a perfect integration - would you be keen to contribute?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.