https://prefect.io logo
j

Jimmy Le

10/21/2020, 10:20 PM
I'm working on a Selenium web scraping project, everything works fine when running locally - however, when I register the flow with Prefect Cloud, I'm not able to launch a Chrome Driver. It appears to be a serialization problem since I'm getting a
TypeError: cannot pickle '_thread.lock' object
. Has anyone run into a similar problem? Any suggestions would be appreciated!
👀 2
d

Dylan

10/22/2020, 1:24 PM
Hi @Jimmy Le! What executor are you using? It sounds like a Task is trying to share an un-pickleable object with another Task. This can happen if you’re trying to achieve some parallelism and also sharing a created
Client
for something that’s not thread safe.
j

Jimmy Le

10/22/2020, 2:41 PM
I’m using the default LocalExecutor. I’ll give it another go today with the DaskExecutor.
d

Dylan

10/22/2020, 2:42 PM
hmmm
Would you be comfortable sharing your flow code here?
m

Marwan Sarieddine

10/22/2020, 5:01 PM
@Dylan - I have faced this in a slightly different context trying to use attrs(https://www.attrs.org/en/stable/examples.html) constructed classes and then register a flow flow.py
Copy code
import attr
from prefect import Flow, task


@attr.s(auto_attribs=True, kw_only=True)
class A:
    size: int


@task
def get_size():
    a = A(size=2)
    return a.size


with Flow("test-flow") as flow:
    get_size()

flow.register("test-flows")
traceback:
Copy code
Traceback (most recent call last):
  File "attr_flow.py", line 19, in <module>
    flow.register("test-flows")
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1608, in register
    registered_flow = client.register(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/client/client.py", line 734, in register
    serialized_flow = flow.serialize(build=build)  # type: Any
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1451, in serialize
    self.storage.add_flow(self)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/environments/storage/local.py", line 140, in add_flow
    flow_location = flow.save(flow_location)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/prefect/core/flow.py", line 1520, in save
    cloudpickle.dump(self, f)
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 55, in dump
    CloudPickler(
  File "/Users/marwansarieddine/.pyenv/versions/etl-embs/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread._local' object
given attrs is quite the popular library when it comes to building classes - this is a bit of disappointment to be honest
but I realize this has to do with the choice of cloudpickle - and cloudpickle not being to pickle attrs constructed classes https://github.com/cloudpipe/cloudpickle/issues/320
d

Dylan

10/22/2020, 5:08 PM
Using Local storage and the
stored_as_script=True
option might solve this issue
🎉 2
👀 1
It bypasses almost all of the pickle logic
🎉 2
m

Marwan Sarieddine

10/22/2020, 5:13 PM
thanks for the tip - my current workaround is to place the class in a utils file and add it to a custom dockerfile using Docker or S3 storage - this way it doesn't have to be pickled ...
👍 2
j

Jimmy Le

10/25/2020, 7:52 PM
@Dylan You genius you. It worked like a charm.
Copy code
f.storage = Local(path="path/to/flow.py", stored_as_script=True)
Registered my flow and ran my agent afterwards. Screenshot of victory attached.
🙌 2
d

Dylan

10/25/2020, 7:53 PM
Awesome! Glad I could help 😄
🙌 1
f

Felix Vemmer

11/07/2020, 5:17 PM
@Dylan first of all thank you so much for helping us! 🙂 Thanks to your help, I was able to resolve my selenium issues, with using
stored_as_script=True
. However, if I have for example a
authenticate.py module
where If I have some basic common selenium tasks such as: • create_driver -> returns selenium driver • login_xyz -> returns selenium driver in logged in state And I want to import those into another script I get again:
TypeError: cannot pickle '_thread.lock' object
So as long as they are all in one file it’s fine but importing reintroduces the old issue. Any idea on some fix, otherwise I’ll just pile all the code together into one big flow 🙂 Thanks!
💯 1
c

Chris White

11/15/2020, 4:46 PM
@Marvin archive “How to integrate Selenium with Prefect?”