https://prefect.io logo
t

Tyler Matteson

05/06/2022, 7:27 PM
Hi folks, I'm new here. I am looking to use Prefect as an orchestrator and primary development focus, but I also am interested in leveraging work that has already been done in Singer.io and the Meltano runner specifically. Choosing Meltano because my team has a background in Python and Vue and not in Airbyte's stack of Java and React. First level of understanding is how I might/should pass and retrieve data between Prefect and Meltano without a direct integration (perhaps with a call to
subprocess
). What should I know? What am I not asking? Tasks are a mix of polling, ETL on demand and scheduled ETL tasks. I think most data professionals would describe the load as "not much", so efficiency is going to take back seat to maintainable and easy to use.
a

Anna Geller

05/06/2022, 7:28 PM
Could you explain how you understand Meltano as a product and what exactly you use it for?
t

Tyler Matteson

05/06/2022, 7:36 PM
Gladly. Meltano is a runner for tasks using singer.io spec abstractions: decoupled and modular ETL (to me, a deemphasized transform step, which they delegate to dbt). The idea being that it supports hundreds of sources and ~2 dozen of targets and allows you to automatically map between them. It integrates with Airflow as an orchestrator, a Dagster integration is in process, and they're interested in a Prefect but there isn't a lot of demand for it since they have an Airflow integration. I've experimented enough to know that Airflow is not my jam and that Prefect is the right tool for me.
a

Anna Geller

05/06/2022, 7:38 PM
Great to hear you like Prefect! And thanks so much for explaining what Meltano does - I never understood what they actually try to do tbh šŸ˜…! So it looks like you want to use Meltano in the same way you would use Airbyte or Fivetran - to trigger data ingestion tasks in your flow, correct? When it comes to Airbyte or Fivetran for data replication you don't actually have to know the underlying system (incl. Java/Python and anything else they might use under the hood), unless you: • want to build your own connectors • want to understand the system better in general (not a prerequisite to use it, but I can understand why you may prefer it - Java traces can be verbose and hard to understand)
t

Tyler Matteson

05/06/2022, 7:42 PM
Regarding language, that's my understanding too - that's it's not critical unless you want to contribute (and I do). Custom connectors are a requirement; practically and aspirationally, they're a line of business for me.
šŸ‘ 1
a

Anna Geller

05/06/2022, 7:42 PM
so currently, your best bet is to trigger your Meltano/Singer processes via a subprocess but I'll forward your request to @alex so that they can look at a potential Meltano integration for Prefect 2.0 At which stage are you in Prefect adoption? It would be easier if you start with Prefect 2.0 directly if you start now
t

Tyler Matteson

05/06/2022, 7:43 PM
Confirming 2.0
a

Anna Geller

05/06/2022, 7:45 PM
Nice! If Meltano and custom connectors is a line of business for you, you could consider contributing a Meltano Collection to Prefect 2.0 - we would definitely appreciate an OSS contribution. Here are some resources if you want to learn more: • https://discourse.prefect.io/t/how-to-contribute-to-prefect-collections/593 • https://discourse.prefect.io/tag/prefect-collections
t

Tyler Matteson

05/06/2022, 7:51 PM
Along those lines, I suspect (without a lot of research) it will be easier to approach the integration from Meltano side, eg Prefect exists and Meltano submits to it. They have an "orchestrator" abstraction (Airflow and Dagster) and a "transformer" abstraction - these are flavors of "plugins" in their nomenclature. I think it'd be great to get Prefect to do both of those things, which would offer a mostly-Prefect development experience with the benefit of singer taps and targets.
Being new, it seems it may be possible to write something that allows all of that to happen inside Prefect
... in a reusable/ composable way
a

Anna Geller

05/06/2022, 8:20 PM
They have an "orchestrator" abstraction (Airflow and Dagster)
This is exactly the part I'm most confused about! How is that supposed to work?! šŸ˜„ it would assume that Airflow's or Dagster's logic is translatable to Prefect and vice versa which is not true. Prefect 2.0 is a completely different tool than Airflow or Dagster, it doesn't even require you to build a DAG. I guess, perhaps Meltano assumes that a DAG is a requirement for orchestrator and they can translate a DAG from one orchestrator to another, but this is not doable IMO - I would love to be proven wrong here
t

Tyler Matteson

05/06/2022, 9:00 PM
Let me try to explain it as I understand it, I am not an expert in any of this stuff. Meltano itself doesn't use a DAG, it allows itself to be run as a node in a DAG (by Airflow, currently). Meltano expresses that it is designed to be a CLI tool that happens to have a UI, which I think you can equate to "nice logging" though that's definitely not all it does. That makes the 'node' use case of the DAG more reasonable. It delegates the transform step as well, to dbt, which also fits into the node-in-a-DAG explanation. That these are chained at all is I thinks is a desire for completeness - "run my pipeline(s)". If the original goal was "stitchdata but opensource (because we're gitlab and opensource is core to our identity)" I think they've gotten a great start.
a

Anna Geller

05/06/2022, 9:19 PM
thanks for this great explanation. Would you then say, Meltano is an EL tool (with optional T by integrating with dbt) or that it's an orchestrator itself (as in "run my pipeline locally") but without scheduling and real orchestration backend?
t

Tyler Matteson

05/06/2022, 9:20 PM
With qualifiers for my inexperience, yes.
without scheduling and real orchestration backend
I would say it delegates orchestration. Airflow is a very real tool but not built in. This is splitting hairs to be sure, but this characterization would matter if you described it to somebody who had deep experience with Meltano and wanted to quibble.
I am going to structure my connection experiment in a way that I think would be able to be contributed
šŸ’Æ 1
And will ask for additional design input once I've got something functional
Thank you for workshopping this with me
a

Anna Geller

05/06/2022, 9:25 PM
With qualifiers for my inexperience, yes.
yes as EL tool or as orchestrator? šŸ˜„
t

Tyler Matteson

05/06/2022, 9:25 PM
EL tool
šŸ‘ 1
a

Anna Geller

05/06/2022, 9:27 PM
Sure, if you want to, you could start small by creating a repo and writing down some bullet points and usage examples in a README Thanks for all the explanations, too and have a great weekend!
10 Views