Hello everyone 👋, I’m having a conceptual issue I’m hoping to get some clarity on.Say for example, you have a number of tasks you want to execute in Redshift. I’m using psycopg2 to establish a connection and create a cursor. You cannot pass the cursor object from task to task because it is not serializable.How do you execute multiple tasks within a transaction block if you need a new connection per task? Am I thinking about this wrong? Is psycopg not the recommended connection method? Thanks!
2 years ago
Strictly from a data engineering perspective, transactions should be representative of one unit of work. Now if we take that concept into the Prefect world, where tasks can fail and be retried, I personally think splitting a transaction across multiple retry-able pieces of logic doesn’t make a ton of sense.Could you refactor your tasks so that each task handles a transaction? You can still get execution order guaranteed by Prefect handling the dependencies between tasks.I’d think that a new connection per task makes a lot of sense, since tasks are assumed to have isolated execution environments. If you want to share the connection across multiple tasks, I believe you should be able to do this while using the
, but that’s more of a side-effect of how that Executor is implemented as opposed to a big feature that it supports.EDIT: Fixed grammar
Thanks Alex! @Jacob (he/him) hello! And let us know if you have any more questions.
2 years ago
I see what you’re saying. I think I was getting a little excited about using the gantt chart to see how long each particular task would take. But yeah I agree with you about having one task represent one transaction.