Title
d

dammy arinde

12/10/2021, 5:43 PM
Happy Friday! Please is there an example I can follow on using great expectations validation in prefect? I have it set up but when I run the flow, I get the error "ConfigNotFoundError('Error: No great_expectations directory was found here!" not sure what config I'm missing and where to add it. Thank you
a

Anna Geller

12/10/2021, 5:52 PM
Did you run "great_expectations init"? This would generate great_expectations directory
a

alex

12/10/2021, 5:52 PM
Hey Dammy! Great Expectations needs to know which directory your
great_expectations.yml
is located in. You can configure where
great_expectations
looks for
great_expectations.yml
by setting
context_root_dir
equal to the path to the directory where
great_expectations.yml
is located.
:upvote: 1
d

dammy arinde

12/10/2021, 5:54 PM
Ok, thank you. Trying to set it up for s3
Yes, I used init and there's a great expectations directory but prefect is not finding it
a

alex

12/10/2021, 6:35 PM
What type of storage are you using for flow?
d

dammy arinde

12/10/2021, 6:36 PM
S3 bucket
a

alex

12/10/2021, 6:37 PM
To make sure that I understand, the code for your flow is stored in S3, correct? Is your Great Expectations config stored somewhere else?
d

dammy arinde

12/10/2021, 6:41 PM
Yes
The Great expectation folder is in my computer
a

alex

12/10/2021, 6:51 PM
You’ll need to make sure that your Great Expectations folder is on the same machine that is executing the flow. One way to do that would be to commit and push your Great Expectation config to a remote git repository so that it can be pulled down and used as part of your flow.
d

dammy arinde

12/10/2021, 6:54 PM
Ok, thank you! Let me try this
Hi Alex! I have committed and pushed my Great Expectation config to a remote git repository, please where do I add it in the Kubernetes run config so it can be used? Thanks
a

Anna Geller

12/15/2021, 3:37 PM
@dammy arinde I would expect that you need to explicitly clone your GE repository in a separate task that runs before the GE task, similarly to this:
import pygit2

@task
def pull_ge_repo(repo_url: str, branch: str = None):
    pygit2.clone_repository(url=repo_url, path="your_path_to_clone_the_repo_into", checkout_branch=branch)
Then, once it’s cloned, you could point GE task to the path specified above. But maybe @alex can confirm
a

alex

12/15/2021, 3:39 PM
Yes, that’s exactly what I was thinking
👍 1
d

dammy arinde

12/15/2021, 3:50 PM
Thank you! I will try using pygit2 now
seems I have to install pygit2 on the docker image first
a

Anna Geller

12/15/2021, 4:23 PM
that’s correct - it must be installed on the agent / environment you run your flow in