def mongo_connect():
db = get_db() // wrapper on top of pymongoclient to connect to the specified db
collection = get_collection(db). // wrapper on top of pymongoclient to get a specified collection form the db
return collection
I understand that prefect needs to pickle all the tasks , but this code works fine while being run as an independent script (without prefect decorators).
How can I make the connection threadlock safe or rather pickle safe?
k
Kevin Kho
09/01/2021, 4:58 PM
Hey @Abhas P, Prefect pickled all task outputs because that’s a requirement to send it between Dask workers. collection here is still a connection so that can’t be pickled. If you don’t use that, the workaround would be to store your Flow as a script so it doesn’t get serialized.
a
Abhas P
09/01/2021, 5:02 PM
Could you elaborate on what you are referring to by "if you don't use that" ?
Also how can I store the flow as a script ?
k
Kevin Kho
09/01/2021, 5:09 PM
Sorry. If you don’t use Dask, The script based storage docs are here
a
Abhas P
09/01/2021, 5:33 PM
Thanks!
So just to reiterate :
1. If I want to use dask - there is no workaround for the db connection pickle issue.
2. If I use any other executor : I can store the flow as a script and make the connection work.
k
Kevin Kho
09/01/2021, 5:52 PM
Yeah exactly that is a dask requirement.
👍 1
Kevin Kho
09/01/2021, 5:55 PM
Not 100% sure on LocalDaskExecutor but I think so
a
Abhas P
09/01/2021, 7:34 PM
Would you be able to point me to a more descriptive example for bitbucket storage? (this resource specifies the basic args , but not a complete example). Again, thank for helping me with this
k
Kevin Kho
09/01/2021, 7:37 PM
Nothing beyond the docs here . If you try it and run into an error, I can help with that
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.