Hello all, we’re trying to migrate to Git storage ...
# ask-community
j
Hello all, we’re trying to migrate to Git storage from GitHub in order to load multiple files in a single flow, but can’t get it to work with even just one file. This error pops up every time.
Failed to load and execute Flow's environment: ValidationError({'type': ['Unsupported value: Git']})
. This is our code, works with GitHub storage
Copy code
import prefect
from prefect.storage import Git
from prefect.run_configs import ECSRun, LocalRun
from prefect import task, Flow, Parameter
from prefect.client import Secret

RUN_CONFIG = ECSRun(image='image/image',
                    cpu='1 vcpu', memory='2 GB')
                  
STORAGE = Git(repo='name/repo', flow_path='path_to_this_file', git_token_secret_name='token')

@task
def say_hello():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hi")
    
with Flow("git-storage", storage=STORAGE, run_config=RUN_CONFIG) as flow:
    say_hello()
k
Hey @Justin Liu, what version of Prefect are you on?
j
typing prefect version in the CLI gives me 0.14.21
is that the right one?
k
Yep anything above 0.14.17 should work. What about on your agent?
j
how do yo ucheck that?
k
What kind of agent are you using?
j
its on ecs fargate
it uses this image “image”: “prefecthq/prefect:latest-python3.7"
k
When was that pulled down?
j
about a month ago?
k
Actually you can see versions from the new agents tab in the UI. Could you check there? Agents -> More -> Core Version
j
0.14.22
k
So this might be coming from the flow run container. What is the container attached to your ECSRun?
Or I mean, what is the Prefect version in there?
j
sorry, are you talking about the docker image we pull?
k
Yes exactly. The image/image there
j
prefecthq/prefect:0.14.10
oh
😄
k
Ok glad we found the culprit!
j
so can i just say like FROM prefecthq/prefect:0.14.22 and it should be ok?
k
yes. ideally though Flow version and agent version match. i think this will work though
j
it worked thanks!!
👍 1
do you know of any existing documentaiton on loading thet different files btw?
k
i don’t think we have any. are you trying to load
.py
files or stuff like
.sql
files?
j
.py
maybe yml too if possile
k
If it’s
.py
, it would be better if you packaged it in the Docker container as a Python module. You might be able to get it to work with Git storage through relative imports, but it is very likely to break because having it as modules does that path handling and imports for you. Git storage is meant for more stuff like
.yml
files. In this case I think Git storage downloads your repo and goes in it so if it was described with a relative path, it should work.
j
hey so we tried opening a yaml file in a task using something like
stream = open('test.yml', 'r')
. it said file not found basically, do you have any ideas why?
k
Yeah. Maybe you can log “ls” or “pwd” to understand at what path the flow is running, and where it is relative to the
.yml
. This is the second time I heard of difficulty so will take note of it. If
test.yml
is in the same folder as the script, I think we can open an issue.
j
Does Git storage actually download the folder to the local machine? I thought it just looked at Git or something. But this is what happens when we run os.listdir()
['lib', 'proc', 'etc', 'media', 'srv', 'usr', 'opt', 'var', 'sbin', 'dev', 'sys', 'bin', 'root', 'lib64', 'mnt', 'tmp', 'run', 'boot', 'home']
k
It does but deletes it after is my understanding. I’ll chat with the team about this more and get back to you tomorrow.
j
hey, so i tried running find on some of the files in the github folder on the ecs instance, and it dind’t return any paths. Are we sure it downloads the repo to the local machine?
k
Will ask someone now
j
thanks ~
k
So there was a bug that was fixed in 0.15.2 and the docs here can help with that.
j
oh this looks good! do i need 0.15.2 on everything for this to work?
k
Yes because there was a bug before that release
j
alright thanks! you mentioned docker storage for running external python files, but can I also use Git storage to run those using import?
k
That might be doable, but not recommended. You’d probably have to fiddle around with the import paths to work. Like they would have to be done from the root. PyCharm and other IDEs do some magic for you with the paths to get it to work even if the project is not a Python module.
j
Path works for me, thanks!! Although the path that is generated doesn’t seem to exist afaik from running a couple ls statements on the instance. Is the git repository really just cloned onto the local machine? Because I can’t seem to locate the files without using Path
k
Chatted with the team. The repository is cloned temporarily when the Flow is loaded from Storage. It is cloned into a temporary directory and the files are removed immediately after the Flow is loaded.
j
It gets removed after the script finishes running then? Also does being a temporary directory make it so that I can’t see it when running ls commands, but I can access it with the file path given from using Path(_file _)?
k
The repo gets removed after the Flow is loaded, not the execution of tasks. So the file gets read in, and then it can retrieve the contents of those files if execution happens immediately (it can’t do this if they live in a task). So by the time the repo is deleted, those are in memory already
We have this code here that is used to clone the repo, but this is not public facing and can change. It may help you get an idea if you want to clone a repo though.
j
Ok i understand now, thanks!