Hello folks. Trying to get a Prefect Cloud Flow ru...
# prefect-community
j
Hello folks. Trying to get a Prefect Cloud Flow running in GKE. Just got this error in StackDriver Logs after registering the flow, and manually triggering it from Cloud UI:
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 986, in get_task_run_state
    result = self.result.write(value, filename="output", **prefect.context)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/gcs_result.py", line 73, in write
    self.gcs_bucket.blob(new.location).upload_from_string(binary_data)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/gcs_result.py", line 35, in gcs_bucket
    from prefect.utilities.gcp import get_storage_client
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/gcp.py", line 6, in <module>
    from google.cloud import bigquery, storage
ImportError: cannot import name 'bigquery' from 'google.cloud' (unknown location)
It seems the process cannot find a prefect core bigquery module inside of google.cloud. I thought perhaps it had something to do with setting
PYTHONPATH
to my own application & lib directories in my Flow's Dockerfile:
Copy code
ENV PYTHONPATH="/app:/app/lib"
And now Prefect core cannot find it's own packages? But experimenting with the alternative "extension" of
PYTHONPATH
instead of overwrite:
Copy code
ENV PYTHONPATH="/app:/app/lib:${PYTHONPATH}"
simply yields a
PYTHONPATH
with a colon on the end (
/app:/app/lib:
), so I opted not to extend, since there is no default. My suspicions about
PYTHONPATH
could very well be a red herring... Any ideas?
c
Hi jars - are you sure that you installed the appropriate google-cloud related packages into your docker image? In this case I’d recomend installing
prefect[gcp]
into your image to ensure all google related packages are available
j
@Chris White, thank you for such quick response. My Flow's Dockerfile inherits from prefecthq's image:
Copy code
FROM prefecthq/prefect:0.11.4-python3.7
Does this image have
prefect[gcp]
preinstalled?
👍 1
c
No, that image only has the base prefect package installed; if you are writing your own Dockerfile you can add the following line:
Copy code
RUN pip install google-cloud-bigquery google-cloud-storage
and you should be good to go
j
Wonderful, I will give that a shot.
c
Awesome - let me know!
j
Will chime back in.
Appreciated.
👍 1
You got it @Chris White. That fixed the problem. I got new ones now, but have ideas how to fix. This close!
c
Awesome!! Yea, getting the docker image setup correct the first time can be a little frustrating but once you get it going the first time it gets much easier to debug!
j
@Chris White, my next issue is about an unpickleable client object. (It's a FireStore Collection Reference) I have it returning from a @task at the moment, and I suspect that's the issue. I can put boilerplate Client Reference creation into each task that needs it, but is there a more Prefect suggested way to hand-around complex client objects or inject them into tasks?
c
so in general we recommend returning pickleable objects so that they can be stored and retrieved if necessary (e.g., if you need to rerun your flow from failure in the future). In this case I’d recommend extracting the data you need from firestore within this task and returning that instead. However, there are ways of turning off this requirement (but note that you won’t be able to recover the state of your flow in the future): - use a
LocalExecutor
(the default) - set
checkpoint=False
on the task that returns the firestore reference
j
Okay, just checking. Thank you. I will go with the recommended best practice. It will lead to a bit of duplication (the creation of the reference itself in each task that needs it), but small price to pay.
c
yea - we’ve considered ways to relax this but haven’t prioritized them yet; I’ll flag the team on your use case as another data point though!
j
thanks as always!
c
you got it 👍