Hi again, i’m trying to get the snowflake query ta...
# ask-community
t
Hi again, i’m trying to get the snowflake query task to work but getting an error 😕
Copy code
Beginning health checks...
System Version check: OK
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/tasks/snowflake/__init__.py", line 7, in <module>
    from prefect.tasks.snowflake.snowflake import (
  File "/usr/local/lib/python3.7/site-packages/prefect/tasks/snowflake/snowflake.py", line 2, in <module>
    import snowflake.connector as sf
ModuleNotFoundError: No module named 'snowflake'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/prefect/healthcheck.py", line 152, in <module>
    flows = cloudpickle_deserialization_check(flow_file_paths)
  File "/opt/prefect/healthcheck.py", line 44, in cloudpickle_deserialization_check
    flows.append(cloudpickle.loads(flow_bytes))
  File "/usr/local/lib/python3.7/site-packages/prefect/tasks/snowflake/__init__.py", line 14, in <module>
    ) from err
NameError: name 'err' is not defined
k
Looks like you don’t have snowflake installed though?
t
i did:
Copy code
pip install "prefect[snowflake]"
to add it (and it seemed to work)
k
Is this on running a flow or registering?
t
registering i tried to kick the agent too to no avail
k
You’re using Docker storage right? Can you show me how you wrote it?
t
sure
k
what!? what is the agent error? that sounds like a separate issue?
t
there is no agent error i just meant i restarted the agent just to be on the safe side
Copy code
from prefect.storage import Docker
from flows import xkcd, hello, seg_pred_backfill

storage = Docker(image_name='poc_multi_flow', image_tag='0.1')

storage.add_flow(xkcd.flow)
storage.add_flow(hello.flow)
storage.add_flow(seg_pred_backfill.flow)

storage = storage.build()

xkcd.flow.storage = storage
hello.flow.storage = storage
seg_pred_backfill.flow = storage

xkcd.flow.register(project_name='poc', build=False, labels=['poc'])
hello.flow.register(project_name='poc', build=False, labels=['poc'])
seg_pred_backfill.flow.register(project_name='poc', build=False, labels=['poc'])
k
Maybe try adding the dependency
Copy code
storage = Docker(..., python_dependencies=["snowflake-connector-python"])
t
hmm - • that seems to work, but feels a bit weird/wrong. feels like the snowflake task was designed to “work out of the box” but here i’m basically required to kind of look into its source-code and know its dependencies? • had a small bug in the code above
seg_pred_backfill.flow = storage
is wrong and is missing a
.storage
on the left-side. Fixing this didn’t resolve the snowflake dependency however.
k
The task library is not installed by default. We shouldn’t install all libraries in the base prefect image because it would be a lot with all of the Cloud providers. Also, you installed
prefect["snowflake"]
, I presume on your local machine, but this won’t go into building the container. So even if this healthcheck succeeded, you would hit the error when you ran the flow but didn’t have snowflake installed
t
so what’s the “best practice” here? i’m pretty sure it can’t be going into each of the task library’s source codes and listing their dependencies, right? let’s say i wanna add a package to the docker image, but with the same “syntax” as i add it to prefect itself (i.e. not the names of the python packages but the name of the addon/extra/whatever)? am i expecting too much her of this library?
k
The dependencies of the tasks that you use in your Flow you mean? I guess so because Prefect provides a simplified interface but without that interface, you would list your requirements and build a Dockerfile, which requires you to be explicit about things either way. On using the extras tag, let me check if you can use it in
python_dependencies
. I’m not sure if you can or can’t. One sec
t
no, the dependencies of the tasks provided by the task library obviously if i “bring my own beer” i need to make sure to include it. but the whole point of a task library is that it saves me the trouble of knowing how to operate a snowflake connector (for that matter)
k
Do you really want to install the libraries used by all tasks in the task library?
t
not in the task library, in the tasks i chose to use (or the tasks whose “extras” i installed)
k
Ah yeah that’s what I meant with my question. The ones used in the Flow.
t
well, yes, i don’t care specifically where or how i declare that “choice”, i don’t care if i need to “redclare” it when creating the docker storage, but - it can’t be by listing internal dependencies of task library tasks… that makes no sense (to me)
e.g. if you told me i had to do
Docker(…, extras=['snowflake']
that would make sense, cause it speaks the same “language” of the task library itself
(though of course i’d prefer if it could be inferred from me adding a flow that uses such a task. After all, the
build
only happens after all flows are added, anyway).
k
So you can with this syntax:
Copy code
STORAGE = Docker(python_dependencies=["pandas", '"prefect[snowflake, aws]"'])
I don’t think it’s a bad suggestion though to do extras. I can look into it tom
t
^ that’s good news 🙂 just tested and it works and ya, extras would make sense (just good syntactic sugar) but the ultimate feature would be inferring them automatically 🙂 i have a day off tomorrow so i might peek at your code to see how hard it might be
k
oh yeah for sure if you wanna make a PR that would be good.
t
no promises 😆
k
Of course! And I don’t know what the core team will think of it, so the safer thing is to maybe make an issue with your proposal
👍 1