I am trying to work out how to work with relationa...
# prefect-community
m
I am trying to work out how to work with relational database sources in Prefect. Does anyone have an example? The usual process is that I have a bunch of tables that needs to be extracted and later loaded somewhere. I have tried the postgres-task. It works, but how would I structure my code. It is not always great to read 100M rows into memory by using
fetch='all'
. If I want to read 5000 rows, store them somewhere and then read another 5000 rows. How would i proceed with prefect?
j
Hi @Mikael, we do a lot of ETL with Prefect where we push data around from various sources to other targets, e.g. retrieve file from SFTP and store in MySQL, transfer from MySQL table to Redshift, etc. My advice is to work out your approach outside of Prefect first (i.e. get Python code or scripts working manually), and then automate it with Prefect. If you don't need to transform data (or only lightly), then I'd suggest using an open source tool like Singer or Embulk to do the transfer. They often take care of lower level details like reading & writing in chunks, etc. We've used Embulk and have had a good experience with it. (Only minor downside to Embulk is that it's Java based so you need a JVM runtime available in whatever environment, docker image, etc. that you run it in.) Hope this helps & feel free to ask more questions.
m
Thanks you for the answer! I’ll try to find out from here.