Hi! Im trying to use The Great Expectations task i...
# ask-community
j
Hi! Im trying to use The Great Expectations task in our Prefect workflow running on prefect server in kubernetes. I'm struggling with providing a data context. Since the ge.data_context.DataContext object can not be serialized with cloudpickle, I cannot initialize it in one task and pass it on to the RunGreatExpectationsValidation task. Any one knowing of a solution to pass such an object between tasks?
a
Hi @Johan Wåhlin, what Prefect version do you use? I remember some older version had issues with this task. And you’re absolutely right that this object can’t be serialized. Can you try initialize this context separately in both tasks?
@Johan Wåhlin I’m not a GE expert, but you could probably skip the DataContext initialization, as long as you have a pre-configured checkpoint. The Prefect task accepts that. Do you use their v2 or v3 API?
j
Thanks @Anna Geller. Using Prefect 0.15.6 and GE API v3. I'd have to rewrite the RunGreatExpectationsValidation class in order to initialize the data-context using the DataContextConfig which worked locally. I'll give it a try using a checkpoint as you suggested instead.
👍 1
k
Yeah the current task is not up to date since v2 was the
batch_kwargs
API and v3 is the
batch_request
. We currently are looking for people who could contribute an updated task for the v3. Would you be interested? 😄 You are definitely right though that the
DataContext()
would have to be instantiated inside, but you might be also able to provide the
context_root_dir
?
😬 1
j
I might not be the right person for this, at least not just yet. Ended up writing my own class, which initializes a context to test in-memory data based on a GE DataContextConfig object. We hope to continue using the combination of Prefect and GE, and as we learn more I might be more comfortable in contributing :)
upvote 1
And, of course. Thanks for your help and suggestions, really appreciate it!
👍 1