Hey all, does anyone have experience using Great E...
# ask-community
j
Hey all, does anyone have experience using Great Expectations? We’re having trouble running checkpoints using the prefect tasks, running into ‘object is not subscriptable’ errors. We’ve tried V2 and V3, but have not attempted going down to 0.12 version. Noticed someone else was getting a different error here but made it successfully https://github.com/PrefectHQ/prefect/issues/4411 any ideas?
k
Hey @Justin Liu, there are probably people in the community who know more than me. I’ve only used the notebook interface of GreatExpectations myself. I can take a stab at it though. Do you have a code snippet for me to start with?
j
thanks appreciate any attempt!
Copy code
validation_task = RunGreatExpectationsValidation()

@task
def insert_rows(title):
   insert = f"""
       INSERT INTO {title} values(1, '{title}');
   """
   snowflake.run(query=insert)

    
with Flow("snow-query-test") as flow:
    batch_kwargs = { "datasource_name": "snowflake_db",
      "data_connector_name": "whole_table",
      "data_asset_name": "testing__whole_table"
    }

    checkpoint_name = Parameter("checkpoint_name", default="my_checkpoint")
    validation_task(
        checkpoint_name=checkpoint_name
#        batch_kwargs=batch_kwargs,
#        expectation_suite_name="bad",
    )
tried using just the checkpoint, as that was the third option for running the task, but it gives the object is not subscriptable error. Using batch_kwargs with expectation_suite gives a different error that likely has to do with the batch_kwargs being wrong but I had trouble finding a good source for those
k
Would you have a traceback if you do flow.run?
j
oh right there’s a flow.run() at the bottom forgot to include
k
Oh the error would help me more, cuz I don’t have snowflake readily set up.
j
Copy code
[2021-06-22 17:51:41-0400] ERROR - prefect.TaskRunner | Unexpected error: TypeError("'Checkpoint' object is not subscriptable")
Traceback (most recent call last):
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 869, in get_task_run_state
    logger=self.logger,
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/utilities/tasks.py", line 454, in method
    return run_method(self, *args, **kwargs)
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/tasks/great_expectations/checkpoints.py", line 233, in run
    for batch in ge_checkpoint["batches"]:
TypeError: 'Checkpoint' object is not subscriptable
[2021-06-22 17:51:41-0400] INFO - prefect.TaskRunner | Task 'RunGreatExpectationsValidation': Finished task run for task with final state: 'Failed'
[2021-06-22 17:51:41-0400] INFO - prefect.FlowRunner | Flow run FAILED: some reference tasks failed.
sorry, misunderstood. it’s also a problem when using pandas instead of snowflake
k
Yep I see it now. Look into it. GE is tough sometimes 😅
j
right! if it doesn’t work out, do you know of any other data validation tools that could work well with prefect?
k
GE lets you validate from data sources like databases but if just Pandas DataFrames, pandera . For Spark, there only GE.
j
oh were using snowflake