https://prefect.io logo
m

Manuel Aristarán

03/06/2020, 3:42 PM
Hey everyone. I’m wondering if anybody here has integrated Singer.io’s taps and targets with a Prefect Flow?
l

Laura Lorenz (she/her)

03/06/2020, 3:56 PM
I haven’t myself, but since the API is generic it might be something we could write a task library abstraction over. Am I understanding right that it would be convenient for you to use singer’s taps, but you need your taps to be dependent on each other and so you want to use Prefect’s flow API? It looks like their hosted product Stitch doesn’t really do the DAG thing if I’m reading this correctly
upvote 1
m

Manuel Aristarán

03/06/2020, 4:00 PM
thanks for replying! Yeah, taps are essentially data sources, and targets are sinks. By default, they’re executed as command line scripts, and they’re connected through pipes.
So I guess that could be implemented as chained `ShellTask`s ?
Also, I got interested in Prefect because of its native support of dataflow between steps of a Flow (Airflow’s main drawback IMHO)
Are there any caveats to be aware of, when “piping” data between steps of a Flow? Like, is it acceptable to pass a a big datastream from a task to the next one?
l

Laura Lorenz (she/her)

03/06/2020, 4:09 PM
Probably the biggest caveat is that Prefect pipes data through the flow in memory against an in memory cache. It is acceptable, but bound by resource constraints. Interestingly, we are just starting work on extending this so that result data can be read off disk lazily when needed. I agree chained ShellTasks could work, with dataflow especially if you or your tap writes your data to stdout (see the caveat in the “Return” section of https://docs.prefect.io/api/latest/tasks/shell.html#shelltask). I agree on the dataflow point, that was my big beef too 🙂
m

Manuel Aristarán

03/06/2020, 4:13 PM
Awesome, thanks a lot!
👍 1
2 Views