https://prefect.io logo
Title
j

jars

06/18/2020, 3:02 AM
Hello folks. Trying to get a Prefect Cloud Flow running in GKE. Just got this error in StackDriver Logs after registering the flow, and manually triggering it from Cloud UI:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 986, in get_task_run_state
    result = self.result.write(value, filename="output", **prefect.context)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/gcs_result.py", line 73, in write
    self.gcs_bucket.blob(new.location).upload_from_string(binary_data)
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/gcs_result.py", line 35, in gcs_bucket
    from prefect.utilities.gcp import get_storage_client
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/gcp.py", line 6, in <module>
    from google.cloud import bigquery, storage
ImportError: cannot import name 'bigquery' from 'google.cloud' (unknown location)
It seems the process cannot find a prefect core bigquery module inside of google.cloud. I thought perhaps it had something to do with setting
PYTHONPATH
to my own application & lib directories in my Flow's Dockerfile:
ENV PYTHONPATH="/app:/app/lib"
And now Prefect core cannot find it's own packages? But experimenting with the alternative "extension" of
PYTHONPATH
instead of overwrite:
ENV PYTHONPATH="/app:/app/lib:${PYTHONPATH}"
simply yields a
PYTHONPATH
with a colon on the end (
/app:/app/lib:
), so I opted not to extend, since there is no default. My suspicions about
PYTHONPATH
could very well be a red herring... Any ideas?
c

Chris White

06/18/2020, 3:03 AM
Hi jars - are you sure that you installed the appropriate google-cloud related packages into your docker image? In this case I’d recomend installing
prefect[gcp]
into your image to ensure all google related packages are available
j

jars

06/18/2020, 3:05 AM
@Chris White, thank you for such quick response. My Flow's Dockerfile inherits from prefecthq's image:
FROM prefecthq/prefect:0.11.4-python3.7
Does this image have
prefect[gcp]
preinstalled?
👍 1
c

Chris White

06/18/2020, 3:06 AM
No, that image only has the base prefect package installed; if you are writing your own Dockerfile you can add the following line:
RUN pip install google-cloud-bigquery google-cloud-storage
and you should be good to go
j

jars

06/18/2020, 3:08 AM
Wonderful, I will give that a shot.
c

Chris White

06/18/2020, 3:08 AM
Awesome - let me know!
j

jars

06/18/2020, 3:08 AM
Will chime back in.
Appreciated.
👍 1
You got it @Chris White. That fixed the problem. I got new ones now, but have ideas how to fix. This close!
c

Chris White

06/18/2020, 3:23 AM
Awesome!! Yea, getting the docker image setup correct the first time can be a little frustrating but once you get it going the first time it gets much easier to debug!
j

jars

06/18/2020, 3:26 AM
@Chris White, my next issue is about an unpickleable client object. (It's a FireStore Collection Reference) I have it returning from a @task at the moment, and I suspect that's the issue. I can put boilerplate Client Reference creation into each task that needs it, but is there a more Prefect suggested way to hand-around complex client objects or inject them into tasks?
c

Chris White

06/18/2020, 3:28 AM
so in general we recommend returning pickleable objects so that they can be stored and retrieved if necessary (e.g., if you need to rerun your flow from failure in the future). In this case I’d recommend extracting the data you need from firestore within this task and returning that instead. However, there are ways of turning off this requirement (but note that you won’t be able to recover the state of your flow in the future): - use a
LocalExecutor
(the default) - set
checkpoint=False
on the task that returns the firestore reference
j

jars

06/18/2020, 3:29 AM
Okay, just checking. Thank you. I will go with the recommended best practice. It will lead to a bit of duplication (the creation of the reference itself in each task that needs it), but small price to pay.
c

Chris White

06/18/2020, 3:32 AM
yea - we’ve considered ways to relax this but haven’t prioritized them yet; I’ll flag the team on your use case as another data point though!
j

jars

06/18/2020, 3:33 AM
thanks as always!
c

Chris White

06/18/2020, 3:33 AM
you got it 👍