Hey everyone, I have a Prefect (1) flow that uses ...
# prefect-community
d
Hey everyone, I have a Prefect (1) flow that uses a GPU (Nvidia T4). I get this really odd behavior - I register the flow using python 3.9, and I run it on a python 3.8 container (tensorflow/tensorflow:latest-gpu) When running on a machine without GPU - everything works. When running on a machine with GPU - I get this error -
Copy code
Task 'upload_data_to_bq_task': Exception encountered during task execution!
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
  File "/usr/local/lib/python3.8/dist-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/Users/dekelr/PycharmProjects/similarity-filter-layer/prefect_tasks/upload_data_to_bq.py", line 24, in upload_data_to_bq_task
SystemError: unknown opcode
The code fails when running this specific row - (batch size is either an int or None)
Copy code
if batch_size is None:
Everything works fine when I change this row to “if not batch_size:” After some troubleshooting - I found this thread - https://github.com/PrefectHQ/prefect/issues/3635 Running the same flow with the original row of -
Copy code
if batch_size is None:
Still doesn’t work when registering with python 3.8 (same python version as in the container) Can you please explain this really odd behavior? Thanks
j
Glad to hear that it sounds like it’s working. I think the most likely cause of the behavior is that it’s still a Python version mismatch issue.
d
Hey, The python versions are now equal and I still get this. So there is still no solution. I also wonder why this specific row is problematic here… All of the tasks before the task that contains this row runs perfectly. Any ideas? this is a problem mostly because this code runs great in a local Prefect run but not after registration.
j
It might be because your TensorFlow model has a mismatch between the Python version it was trained with and the Python version it is being run with. See here: https://stackoverflow.com/a/68455418/4590385
d
Hey, I saw this one - but I get this error 2 tasks after loading the model and using it. I get this error specifically when running this row -
Copy code
if batch_size is None:
So I think this time, the model version mismatch isn’t the cause of this error.
j
You might not be seeing the error earlier due to lazy evaluation in tf? If you train the tf model in the same Python version as used here, does it work?
d
I’m using a pre-trained BERT model in order to transform some textual data into vectors. So the task in which I use the model works just fine - it’s not “lazy” since the transformation is actually happening there - I can see the transformed data in my logs before the next task fails (while running “if batch_size is None”). I would love to check this, but I cannot train the model again currently (it’s not yet an automated flow/process).