Hello, I'm new to Prefect and I'm wondering if I c...
# ask-community
s
Hello, I'm new to Prefect and I'm wondering if I can use Prefect for my use case. I have a python script that extracts data from Google sheets and loads them into SnowFlake. A cron is used to schedule it running daily. Similarly, there are singer taps and targets orchestrated by the pipelinewise to load data from databases to SnowFlake. They are also scheduled by crons. The loaded data are then transformed using DBT and are scheduled much later than the above crons. Most of the examples I see in the Internet are workflows running python functions with task decorator.
k
Hi @Sumit Kumar Rai! Yes this sounds like a use case that Prefect can perfectly handle. We have a lot a task library that a lot of community members have contributed to. The Snowflake task and dbt task are commonly used. there is also a Google Sheets task. This can all be tied together with Prefect. Yes, you will have to write Python code but there is definitely a lot to start with.
s
@Kevin Kho Does that mean I have to ditch my python script that loads the Google Sheet and rewrite it as prefect task?
k
If you already have a Python script, all you have to do is wrap it in a function and apply the
@task
decorator to use it in Prefect. Prefect by design is non-invasive and cleanly wraps around your existing code.
s
What about the singer taps and targets that I'm using in pipelinewise? How can I use them in Prefect?
k
I am not familiar with this. Let me look at bit
Ok so for pipelinewise, if you are invoking it using the command line, you can use Prefect to run that via a
ShellTask
as long as you install the dependencies
Our dbt task inherits from the
ShellTask
. I think the approach would be to orchestrate the pipelinewise stuff through Prefect.
s
This is helpful. But how can I make the pipelinewise cli available?
k
How are you running them at the moment? Through a UI?
At the very least though, taps and targets can be invoked like
Copy code
my-packaged-tap | target-postgresql -- config conf.json
s
What I meant was, in the case of dbt, I can pass dbt command and the
ShellTask
runs the command. But the dbt cli has to be somewhere, maybe it is already installed in the Prefect agent (I don't know).
k
I’m seeing this in the pipelinewise docs
Copy code
pipelinewise run_tap --tap mysql_sample --target snowflake
If you use it like this, Prefect can just call it with the shell task.
Ahh, yes to you need to install it on the agent. Ideally this is done through Docker. The agent will be responsible for grabbing that container and running your flow.
I just wanna mention that running pipelinewise for batch ETL jobs makes sense, but it probably doesn’t make as much sense together if you want to use Pipelinewise as a stream.
s
Currently we are ok with the batch jobs for our use cases.
👍 1
So, do I create a workflow to install pipelinewise or should I reference prefect agent docker image to make a custom one and create agents from it?
k
Yes you can make your own image and pass it to an agent. Only requirement is that prefect can be installed on it.
s
I noticed that GitHub tasks doesn't have clone tasks. How can I retrieve and use my dbt project that exists in the repo?
k
You can probably do it through the Docker container that you pass to run things on. Would that work for you?
s
If I pass the docker container, will the Prefect's DbtTask be in use?
k
It does but you would need to install
dbt
itself in the container.