Hey community! I’m trying to set up Prefect with GreatExpectations, using the official task. I’m trying to run a validation on an in-memory Dataframe (result of previous task) but have trouble to set it up correctly. I’m trying to use v3 API of GE, set up the expectation suite, checkpoint according to the guide (until _context.run_checkpoint)_ but struggle to pass said dataframe. Would anyone be able to offer some guidance?
k
Kevin Kho
01/19/2022, 3:42 PM
I don’t think you pass an in memory dataframe for that task. You point to a DataFrame in storage and run the checkpoint against it
Kevin Kho
01/19/2022, 3:44 PM
I think would you need to modify the task to support that.
t
Tomek Florek
01/19/2022, 3:44 PM
Thanks for getting back Kevin! What do you mean by DataFrame in storage? A saved csv/parquet file?
k
Kevin Kho
01/19/2022, 3:45 PM
Yeah exactly
t
Tomek Florek
01/19/2022, 3:47 PM
Alright, that makes things much easier, I guess I was trying to complicate my life for no big reason. Thanks Kevin 👌 , will run with that and let know how it went.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.