Hi! Im trying to use The Great Expectations task in our Prefect workflow running on prefect server in kubernetes. I'm struggling with providing a data context. Since the ge.data_context.DataContext object can not be serialized with cloudpickle, I cannot initialize it in one task and pass it on to the RunGreatExpectationsValidation task. Any one knowing of a solution to pass such an object between tasks?
a
Anna Geller
10/21/2021, 8:55 AM
Hi @Johan Wåhlin, what Prefect version do you use? I remember some older version had issues with this task.
And you’re absolutely right that this object can’t be serialized. Can you try initialize this context separately in both tasks?
Anna Geller
10/21/2021, 9:01 AM
@Johan Wåhlin I’m not a GE expert, but you could probably skip the DataContext initialization, as long as you have a pre-configured checkpoint. The Prefect task accepts that. Do you use their v2 or v3 API?
j
Johan Wåhlin
10/21/2021, 12:42 PM
Thanks @Anna Geller. Using Prefect 0.15.6 and GE API v3. I'd have to rewrite the RunGreatExpectationsValidation class in order to initialize the data-context using the DataContextConfig which worked locally. I'll give it a try using a checkpoint as you suggested instead.
👍 1
k
Kevin Kho
10/21/2021, 2:10 PM
Yeah the current task is not up to date since v2 was the
batch_kwargs
API and v3 is the
batch_request
. We currently are looking for people who could contribute an updated task for the v3. Would you be interested? 😄
You are definitely right though that the
DataContext()
would have to be instantiated inside, but you might be also able to provide the
context_root_dir
?
😬 1
j
Johan Wåhlin
10/22/2021, 12:47 PM
I might not be the right person for this, at least not just yet. Ended up writing my own class, which initializes a context to test in-memory data based on a GE DataContextConfig object. We hope to continue using the combination of Prefect and GE, and as we learn more I might be more comfortable in contributing :)
upvote 1
Johan Wåhlin
10/22/2021, 3:17 PM
And, of course. Thanks for your help and suggestions, really appreciate it!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.