I’m using `GCSResult` in my flow and getting error...
# prefect-community
a
I’m using
GCSResult
in my flow and getting errors from GCS (caused by the
GCSResults
object trying to store task results:
Copy code
Unexpected error: SSLError(MaxRetryError("HTTPSConnectionPool(host='<http://storage.googleapis.com|storage.googleapis.com>', port=443): Max retries exceeded with url: /upload/storage/v1/b/prefect-bucket/o?uploadType=multipart (Caused by SSLError(OSError(24, 'Too many open files')))"))
The reason is (probably) that I have many mapped tasks, and they are running in parallel. It’s a 32-core machine, but I can’t really figure out how Prefect/Dask decides how many run in parallel. Sometimes it’s more than 32. The bigger problem here is that I think it caused the flow to get stuck, so I didn’t even have the indication for the failure and couldn’t go and restart it. Anyhow, any suggestions on how to overcome this? Or at the very least cause my flow to fail on such things so that I can restart? Does it have anything to do with the fact that these tasks have a retry option?
I forgot to mention that the tasks themselves also write results to GCS. Could that be the problem?
j
In your task code are you properly closing your GCS client in each task? That'd be the first thing I'd check.
It's also possible that prefect isn't properly cleaning up the GCS clients it uses in
GCSResult
. If you don't see an obvious issue in your code, would you mind creating an issue https://github.com/PrefectHQ/prefect/issues? Prefect should be able to handle this use case just fine, so if it's not it's a bug.
😎 1
a
hmmm… it’s possible. I’m using
fs-gcsfs
which is a pluging for pyfilesystem2 that allows me to open GCS and local files interchangeably. I have things like this in the code:
Copy code
return fs.open_fs(dir_path, create=False).exists(SUCCESS)
where
dir_path
is a GCS path, so I’m probably not closing it correctly. But to my defense, to the best of my knowledge `cpython`’s garbage collector should close these cursors automatically (as opposed to
pypy
implementation). I’ll try and report back
@Jim Crist-Harif thanks for the tip. I fixed my broken implementation and now the flow runs great. There’s no bug on your end as far as I can tell 🙂
j
Hooray! Glad you figured it out.
a
The funny thing is that this is a caching mechanism I wrote before you guys published the new
Result
architecture. I guess it’s time to decommission it but I need to find time for that 😕
😂 1