I'm working on a Selenium web scraping project, ev...
# ask-community
j
I'm working on a Selenium web scraping project, everything works fine when running locally - however, when I register the flow with Prefect Cloud, I'm not able to launch a Chrome Driver. It appears to be a serialization problem since I'm getting a
TypeError: cannot pickle '_thread.lock' object
. Has anyone run into a similar problem? Any suggestions would be appreciated!
👀 2
d
Hi @Jimmy Le! What executor are you using? It sounds like a Task is trying to share an un-pickleable object with another Task. This can happen if you’re trying to achieve some parallelism and also sharing a created
Client
for something that’s not thread safe.
j
I’m using the default LocalExecutor. I’ll give it another go today with the DaskExecutor.
d
hmmm
Would you be comfortable sharing your flow code here?
m
@Dylan - I have faced this in a slightly different context trying to use attrs(https://www.attrs.org/en/stable/examples.html) constructed classes and then register a flow flow.py
Copy code
import attr
from prefect import Flow, task


@attr.s(auto_attribs=True, kw_only=True)
class A:
    size: int


@task
def get_size():
    a = A(size=2)
    return a.size


with Flow("test-flow") as flow:
    get_size()

flow.register("test-flows")
traceback:
Copy code
Traceback (most recent call last):
  File "attr_flow.py", line 19, in <module>
    flow.register("test-flows")
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1608, in register
    registered_flow = client.register(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/client/client.py", line 734, in register
    serialized_flow = flow.serialize(build=build)  # type: Any
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1451, in serialize
    self.storage.add_flow(self)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/environments/storage/local.py", line 140, in add_flow
    flow_location = flow.save(flow_location)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1520, in save
    cloudpickle.dump(self, f)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 55, in dump
    CloudPickler(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread._local' object
given attrs is quite the popular library when it comes to building classes - this is a bit of disappointment to be honest
but I realize this has to do with the choice of cloudpickle - and cloudpickle not being to pickle attrs constructed classes https://github.com/cloudpipe/cloudpickle/issues/320
d
Using Local storage and the
stored_as_script=True
option might solve this issue
🎉 2
👀 1
It bypasses almost all of the pickle logic
🎉 2
m
thanks for the tip - my current workaround is to place the class in a utils file and add it to a custom dockerfile using Docker or S3 storage - this way it doesn't have to be pickled ...
👍 2
j
@Dylan You genius you. It worked like a charm.
Copy code
f.storage = Local(path="path/to/flow.py", stored_as_script=True)
Registered my flow and ran my agent afterwards. Screenshot of victory attached.
🙌 2
d
Awesome! Glad I could help 😄
🙌 1
f
@Dylan first of all thank you so much for helping us! 🙂 Thanks to your help, I was able to resolve my selenium issues, with using
stored_as_script=True
. However, if I have for example a
authenticate.py module
where If I have some basic common selenium tasks such as: • create_driver -> returns selenium driver • login_xyz -> returns selenium driver in logged in state And I want to import those into another script I get again:
TypeError: cannot pickle '_thread.lock' object
So as long as they are all in one file it’s fine but importing reintroduces the old issue. Any idea on some fix, otherwise I’ll just pile all the code together into one big flow 🙂 Thanks!
💯 1
c
@Marvin archive “How to integrate Selenium with Prefect?”