Laura Lorenz (she/her)
04/30/2020, 6:25 PMJoe Schmid
04/30/2020, 7:14 PMSteve Taylor
04/30/2020, 7:32 PMJie Lou
04/30/2020, 9:22 PMSteve Taylor
06/19/2020, 7:14 PM@task()
def validate_roster(df):
"""
Validate the dataframe using great_expectations file.
This may throw a warning, "Pandas doesn't allow columns to be created via a new attribute name"
which may be ignored. Working on this.
Returns the dataframe given
"""
# Create a ge "batch"
df_ge = ge.from_pandas(pandas_df=df)
validation_result = df_ge.validate(
expectation_suite="resources/expectations.json",
result_format="SUMMARY",
)
if not validation_result.success:
<http://logger.info|logger.info>(validation_result)
raise Exception("Dataframe did not validate correctly.")
return df
I want to like it, but the result_format and such is just a little chaotic, especially with things that require SUMMARY for stats and floats. We're finding Pandera to be easier to live with.