Hey all does anyone have experience using Great Expectations Prefect Community #ask-community

Hey all, does anyone have experience using Great E...

Justin Liu

06/22/2021, 9:15 PM

Hey all, does anyone have experience using Great Expectations? We’re having trouble running checkpoints using the prefect tasks, running into ‘object is not subscriptable’ errors. We’ve tried V2 and V3, but have not attempted going down to 0.12 version. Noticed someone else was getting a different error here but made it successfully https://github.com/PrefectHQ/prefect/issues/4411 any ideas?

Kevin Kho

06/22/2021, 9:40 PM

Hey @Justin Liu, there are probably people in the community who know more than me. I’ve only used the notebook interface of GreatExpectations myself. I can take a stab at it though. Do you have a code snippet for me to start with?

Justin Liu

06/22/2021, 9:43 PM

thanks appreciate any attempt!

Justin Liu

06/22/2021, 9:44 PM

Copy code

validation_task = RunGreatExpectationsValidation()

@task
def insert_rows(title):
   insert = f"""
       INSERT INTO {title} values(1, '{title}');
   """
   snowflake.run(query=insert)

    
with Flow("snow-query-test") as flow:
    batch_kwargs = { "datasource_name": "snowflake_db",
      "data_connector_name": "whole_table",
      "data_asset_name": "testing__whole_table"
    }

    checkpoint_name = Parameter("checkpoint_name", default="my_checkpoint")
    validation_task(
        checkpoint_name=checkpoint_name
#        batch_kwargs=batch_kwargs,
#        expectation_suite_name="bad",
    )

Justin Liu

06/22/2021, 9:45 PM

tried using just the checkpoint, as that was the third option for running the task, but it gives the object is not subscriptable error. Using batch_kwargs with expectation_suite gives a different error that likely has to do with the batch_kwargs being wrong but I had trouble finding a good source for those

Kevin Kho

06/22/2021, 9:49 PM

Would you have a traceback if you do flow.run?

Justin Liu

06/22/2021, 9:49 PM

oh right there’s a flow.run() at the bottom forgot to include

Kevin Kho

06/22/2021, 9:50 PM

Oh the error would help me more, cuz I don’t have snowflake readily set up.

Justin Liu

06/22/2021, 9:53 PM

Copy code

[2021-06-22 17:51:41-0400] ERROR - prefect.TaskRunner | Unexpected error: TypeError("'Checkpoint' object is not subscriptable")
Traceback (most recent call last):
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 869, in get_task_run_state
    logger=self.logger,
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/utilities/tasks.py", line 454, in method
    return run_method(self, *args, **kwargs)
  File "/Users/jliu/anaconda3/envs/prefect-test/lib/python3.7/site-packages/prefect/tasks/great_expectations/checkpoints.py", line 233, in run
    for batch in ge_checkpoint["batches"]:
TypeError: 'Checkpoint' object is not subscriptable
[2021-06-22 17:51:41-0400] INFO - prefect.TaskRunner | Task 'RunGreatExpectationsValidation': Finished task run for task with final state: 'Failed'
[2021-06-22 17:51:41-0400] INFO - prefect.FlowRunner | Flow run FAILED: some reference tasks failed.

Justin Liu

06/22/2021, 9:53 PM

sorry, misunderstood. it’s also a problem when using pandas instead of snowflake

Kevin Kho

06/22/2021, 10:02 PM

Yep I see it now. Look into it. GE is tough sometimes 😅

Justin Liu

06/22/2021, 10:09 PM

right! if it doesn’t work out, do you know of any other data validation tools that could work well with prefect?

Kevin Kho

06/22/2021, 10:11 PM

GE lets you validate from data sources like databases but if just Pandas DataFrames, pandera . For Spark, there only GE.

Justin Liu

06/22/2021, 10:12 PM

oh were using snowflake

Open in Slack

Previous Next