https://prefect.io logo
Title
m

Milton

03/18/2022, 10:01 PM
Hi, in prefect 0.14.11, prefect execute flow-run downloads the necessary local dependencies from the repo when using GitLab storage but this behavior seems to have changed in at least 1.0.0…could you help remind me of when this was changed? Also does this mean that we have to build a docker image for each repo we have? :(
k

Kevin Kho

03/18/2022, 11:35 PM
You mean if you have custom modules defined? GitLab storage never downloaded the local dependencies along with the Flow. Did you have some other way of bringing them to your execution environment?
m

Milton

03/19/2022, 12:03 AM
No we dont. I was able to verify that in 0.14.11. The local files in that repo weren't there before I ran prefect execute flow-run and appeared after that command
Any thoughts?
k

Kevin Kho

03/21/2022, 2:12 PM
Will look into this in a bit
I am looking at the history of Gitlab storage here and there is literally no change around this. • Feb 5 is 0.14.7 • Aug 17 is 0.15.4 but his change seems so insignificant
Let me check if there were changes in the utility functions Gitlab storage uses
We just use the Gitlab client…so let me take a look at their changelog
Not seeing anything in their changelog that would change behavior. Let me try making a Flow on 0.14.11
It doesn’t work for me on 0.14.11 with pygitlab 0.3.2. My test code is here. Could you check if it replicates your setup and if it clones dependency.py for you?
m

Milton

03/21/2022, 3:57 PM
yes, it does work for me. Btw, we use KubernetesRun() and we specify the image version as 0.14.11 as well in the run_config.
k

Kevin Kho

03/21/2022, 3:58 PM
Can I see your RunConfig definition? and can you check your
pygitlab
version for me?
m

Milton

03/21/2022, 4:27 PM
run_config = KubernetesRun(image="path/to/prefect:0.14.11",
  labels= ["your", "label"],
  cpu_limit="1",
  cpu_request="0.5",
  memory_limit="200Mi",
  memory_request="100Mi",
)
``````
python-gitlab version is 2.10.0
k

Kevin Kho

03/21/2022, 4:28 PM
Do you have your own image or is that the one from Prefect’s DockerHub?
m

Milton

03/21/2022, 4:30 PM
our own, we added a few dependencies to the official prefect image
and i couldn’t seem to find any code relevant to cloning the repo in the codecase either
k

Kevin Kho

03/21/2022, 4:31 PM
Would you be able to show me the Dockerfile? Even if DM
m

Milton

03/21/2022, 4:33 PM
nothing fancy in it
the key line is pip install prefect[orchestration_extras]==0.14.11
other than that
a

Anna Geller

03/21/2022, 4:36 PM
@Milton can you share a small hello world example flow that we could use to reproduce your issue? Storage and run config would be especially important. Sharing your Dockerfile is also helpful, even if there's nothing fancy in there 🙂 Finally, if you could share the output of "prefect diagnostics" would give us even more info to check
m

Milton

03/21/2022, 4:38 PM
where to run
prefect diagnostics
?
a

Anna Geller

03/21/2022, 4:39 PM
in your terminal ideally from the machine from which you register your flow
m

Milton

03/21/2022, 4:43 PM
would it make mroe sense to run it in the container where the flow gets executed?
a

Anna Geller

03/21/2022, 4:49 PM
no, from registration env
m

Milton

03/21/2022, 4:55 PM
FROM <internal-python-base-image>


ENV PYTHONPATH /home/svc_app

ENV PREFECT__USER_CONFIG_PATH /home/svc_app/.prefect/config.toml


# copy over files needed for init

COPY --chown=svc_app \

    entrypoint.sh \

    /home/svc_app/


# # pre-install base dependencies

USER root

RUN curl -OL "<https://github.com/krallin/tini/releases/download/v0.19.0/tini_0.19.0-amd64.rpm>"

RUN yum localinstall tini_0.19.0-amd64.rpm -y

RUN rm tini_0.19.0-amd64.rpm


# Environment Setup

USER svc_app

RUN pip install --upgrade pip

WORKDIR /home/svc_app

RUN mkdir .prefect

RUN pip uninstall prefect -y

RUN pip install

    'prefect[orchestration_extras]==0.14.11'


ENTRYPOINT ["tini", "-g", "--", "/home/svc_app/entrypoint.sh"]
What Kevin made was a good example: https://gitlab.com/kvnkho/prefect-gitlab/-/tree/main/
Just need to add to it
run_config = KubernetesRun(image="path/to/prefect:0.14.11",
  labels= ["your", "label"],
  cpu_limit="1",
  cpu_request="0.5",
  memory_limit="200Mi",
  memory_request="100Mi",
)
And the output for diagnostics
$ prefect diagnostics
{
 "config_overrides": {
  "run_config": {
   "cpu_limit": true,
   "cpu_request": true,
   "env": {
    ...
   },
   "image": true,
   "labels": true,
   "memory_limit": true,
   "memory_request": true
  },
  "schedule": {
   "cron": true,
   "tz": true
  },
  "storage": {
   "access_token_secret": true,
   "host": true,
   "path": true,
   "ref": true,
   "repo": true
  }
 },
 "env_vars": [
  "PREFECT__STORAGE__REF",
  "PREFECT__USER_CONFIG_PATH"
 ],
 "system_information": {
  "platform": "Linux-3.10.0-1062.el7.x86_64-x86_64-with-redhat-7.7-Maipo",
  "prefect_backend": "cloud",
  "prefect_version": "0.14.11",
  "python_version": "3.7.3"
 }
}
k

Kevin Kho

03/21/2022, 5:08 PM
I updated my script to run 0.14.11 on Kubernetes and it’s really not working. Were you able to run my Flow if you use the same storage as me (public gitlab)?
:upvote: 1
m

Milton

03/21/2022, 5:13 PM
is there a way to turn on the debugging for
prefect execute flow-run
? Trying to pinpoint which line of code that does the downloading
@Kevin Kho what didn’t work? I am not able to do it directly like you from the public gitlab as we don’t have access to it from our corp env. I just copy and pasted your code and ran it
k

Kevin Kho

03/21/2022, 5:16 PM
I have my logs in the image up there. The dependency is not downloaded alongside the Flow file so I get an import error.
m

Milton

03/21/2022, 5:18 PM
hmm i don’t see that from the screenshot
is there a way to turn on the debugging for
prefect execute flow-run
? Trying to pinpoint which line of code that does the downloading
k

Kevin Kho

03/21/2022, 5:18 PM
Ah sorry my bad I pasted the wrong one. This is the right one
prefect execute flow-run
will already have the Flow downloaded so i’s not there. Anything related to cloning will be found here . I don’t see enough debug level logs though
m

Milton

03/21/2022, 5:20 PM
yeah i don’t get this error. i can even exec into the container and see the dependencies downloaded and saved
i am positive its downloaded by
prefect execute flow-run
somehow
when i fire up the container by itself, the dependency files are not there and they show up the moment i do
prefect execute flow-run
k

Kevin Kho

03/21/2022, 5:28 PM
The only way I can envision that happening is if you are doing something with the entrypoint or the shell file, but if you are, I would still expect it to work with a Prefect upgrade because that would be independent. It might be Gitlab too, but there is really nothing on the Prefect side that would suggest this. The CLI command is basically just a GraphQL call. Note the
storage.get_flow
called here that then goes to the Gitlab
get_flow
. I know these links are Prefect 1.1 but this file hasn’t changed significantly. Can you try running with the base Prefect image?
I think running with the base prefecthq/prefect 0.14.11 image is a good test to see if stuff gets downloaded when you do
prefect execute flow-run
m

Milton

03/21/2022, 5:35 PM
it doesn’t seem the stuff gets downloaded running
prefecthq/prefect 0.14.11
we thought this is a feature offered by Prefect and it’s super useful to us
If this is not supported, does this mean that we have no option but to rebuild a docker image for each repo as well each change we make to a repo for testing?
k

Kevin Kho

03/21/2022, 5:39 PM
We definitely understand that. Easy packaging light this is something we intend to work on in Prefect 2.0 in the near future.
Not exactly on that last question! The things is, I feel like you actually had a solution somewhere in your custom image that handled this well. What I’ve seen other users do is use the
ENTRYPOINT
of the image to download the Git repo and
pip install -e .
on the image. That will get your up to date dependencies during execution. I am positive you have something like that going on, and it should be possible in 1.0
But yes in the near future for Prefect 2.0, we will be looking into something like this
m

Milton

03/22/2022, 9:14 PM
Hi Kevin, we looked further and realized the Git storage would download the whole repo and decided to give it a try. However, after registering the flow with
storage = Git(
    repo="org/repo",                            # name of repo
    flow_path="flows/my_flow.py",               # location of flow file in repo
    repo_host="<http://gitlab.com|gitlab.com>",                     # repo host name, which may be custom
    git_token_secret_name="MY_GIT_ACCESS_TOKEN",# name of Secret containing Deploy Token
    git_token_username="myuser"                 # username associated with the Deploy Token
)
the flow run triggered returns
└── 20:45:45 | INFO | Entered state <Scheduled>: Flow run scheduled.
└── 20:45:53 | INFO | Entered state <Submitted>: Submitted for execution
└── 20:45:53 | INFO | Submitted for execution: Job prefect-job-36cd933b
└── 20:45:57 | INFO | Entered state <Failed>: Failed to load and execute flow run: HangupException(‘ssh: Could not resolve hostname https: Name or service not known\r’)
└── 20:45:57 | ERROR | Failed to load and execute flow run: HangupException(‘ssh: Could not resolve hostname https: Name or service not known\r’)
Flow run failed!
Not use why
ssh
would be used at all and we have use_ssh set to false by default in Git Storage
k

Kevin Kho

03/22/2022, 9:17 PM
Git storage does download, but not intended for importing other python scripts in the same repo. It won’t work for that. It can load stuff like
.sql
or
.yaml
files but Python imports won’t work because you’d need some heavy Python path manipulation. It might not even be doable. I’ve seen people try. Not discourage you to try on that. I just really don’t know if it can be done.
m

Milton

03/22/2022, 9:19 PM
Yeah but do you know why
ssh
would be used in the first place?
Couldn’t get Git Storgag working
k

Kevin Kho

03/22/2022, 9:29 PM
I can’t quite tell immediately why the error occurs. I thought
use_ssh
was a flag and I thought it was set to False.
Let me take a look
m

Milton

03/22/2022, 9:34 PM
k

Kevin Kho

03/22/2022, 9:36 PM
Yeah this is not a Prefect error for sure. I am honestly not even sure if I trust that error. But since you’re in the code, a good test actually is to try cloning yourself with this utility class.
m

Milton

03/22/2022, 9:36 PM
and dulwich.porcelain.clone should be parsing it correctly as per https://github.com/jelmer/dulwich/blob/83521a1efe323650b971ab47e47913f0993f0c27/dulwich/client.py#L2271 and get_transport_and_path_from_url should return an http client
k

Kevin Kho

03/22/2022, 9:41 PM
Ohh you know what? I think this might be related to dulwich version. Can you check what you are on?
There is this issue
m

Milton

03/22/2022, 9:46 PM
we are on 0.20.35 and the issue you posted have been resolved as of 0.20.31
k

Kevin Kho

03/22/2022, 9:47 PM
Ah ok that should be good then, but maybe you can try downgrading to 0.20.31 because that version should be confirmed working
m

Milton

03/22/2022, 10:08 PM
I have that downgraded and ran with this test script
from prefect.utilities.git import TemporaryGitRepo
with TemporaryGitRepo(
git_clone_url=“https😕/&lt;user&gt;:<pass>@self-hosted-gitlab.com/<repo>.git",
branch_name=“master”,
tag=None,
commit=None,
clone_depth=1,
) as temp_repo:
print(temp_repo.temp_dir.name)
and here is the error
Traceback (most recent call last):
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 1091, in fetch_pack
refs, server_capabilities = read_pkt_refs(proto)
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 234, in read_pkt_refs
for pkt in proto.read_pkt_seq():
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/protocol.py”, line 287, in read_pkt_seq
pkt = self.read_pkt_line()
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/protocol.py”, line 232, in read_pkt_line
raise HangupException()
dulwich.errors.HangupException: The remote server unexpectedly closed the connection.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “test.py”, line 8, in <module>
clone_depth=1,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/prefect/utilities/git.py”, line 48, in enter
source=self.git_clone_url, target=self.temp_dir.name, depth=self.clone_depth
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/porcelain.py”, line 451, in clone
depth=depth,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 535, in clone
result = self.fetch(path, target, depth=depth)
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 607, in fetch
depth=depth,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 1093, in fetch_pack
raise _remote_error_from_stderr(stderr)
dulwich.errors.HangupException: ssh: Could not resolve hostname https: Name or service not known
k

Kevin Kho

03/22/2022, 10:09 PM
Is this on a self hosted Gitlab or the public one?
Oh man looks like Git storage formats the string incorrectly based on this for Gitlab
You might need to use
git_clone_url_secret_name
instead to create the string
m

Milton

03/22/2022, 10:25 PM
it’s on a self hosted GItlab and i just confirmed that
git clone https://<user>:<pass>@self-hosted-gitlab.com/<repo>.git
works
a

Anna Geller

03/22/2022, 10:35 PM
@Milton does it mean your issue is resolved now or only that this manual git clone step has worked?
m

Milton

03/22/2022, 10:37 PM
only that this manual git clone step works
the python test code snippet is still failing with the error above
a

Anna Geller

03/22/2022, 10:39 PM
this thread has +70 messages 😅 could you summarize what is the problem that you're facing as of now?
generally your Dockerfile could be vastly simplified if you'd use Prefect base image
m

Milton

03/22/2022, 10:41 PM
The issue is now that Git Storage doesn’t work. Running this test script
from prefect.utilities.git import TemporaryGitRepo
with TemporaryGitRepo(
git_clone_url=“https😕/&lt;user&gt;:<pass>@self-hosted-gitlab.com/<repo>.git",
branch_name=“master”,
tag=None,
commit=None,
clone_depth=1,
) as temp_repo:
print(temp_repo.temp_dir.name)
gives
Traceback (most recent call last):
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 1091, in fetch_pack
refs, server_capabilities = read_pkt_refs(proto)
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 234, in read_pkt_refs
for pkt in proto.read_pkt_seq():
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/protocol.py”, line 287, in read_pkt_seq
pkt = self.read_pkt_line()
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/protocol.py”, line 232, in read_pkt_line
raise HangupException()
dulwich.errors.HangupException: The remote server unexpectedly closed the connection.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “test.py”, line 8, in <module>
clone_depth=1,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/prefect/utilities/git.py”, line 48, in enter
source=self.git_clone_url, target=self.temp_dir.name, depth=self.clone_depth
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/porcelain.py”, line 451, in clone
depth=depth,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 535, in clone
result = self.fetch(path, target, depth=depth)
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 607, in fetch
depth=depth,
File “/home/<user>/miniconda3/lib/python3.7/site-packages/dulwich/client.py”, line 1093, in fetch_pack
raise _remote_error_from_stderr(stderr)
dulwich.errors.HangupException: ssh: Could not resolve hostname https: Name or service not known
a

Anna Geller

03/22/2022, 10:44 PM
it looks like you're using a custom self-hosted GitLab server. Does it run in a way that it's reachable from your Kubernetes flow run pod? (I saw you use KubernetesRun)
maybe skip the branch name here?
with TemporaryGitRepo(
   git_clone_url="https://<user>:<pass>@self-hosted-gitlab.com/<repo>.git",
   branch_name="master", # try without that since your git clone command that worked didn't have that
k

Kevin Kho

03/23/2022, 4:06 AM
So I created a private Gitlab repo on , generated a personal access token and used dulwich = 0.20.31 and I was able to clone the repo using the format you provided. I tried giving a wrong branch or wrong credentials but the errors were not the same as what you had
:upvote: 1
Not sure how else to diagnose your issue without going deeper into your setup. You could try to replicate on public Gitlab if possible and then show me how you did it if you can, because I really can’t right now
m

Milton

03/23/2022, 3:40 PM
removing the branch name still gives the error. Not sure what’s wrong. For now, we’ve decided to give up on Git Storage and patch GitLab storage internally and make it download the whole repo.
Thanks you so much for your help @Kevin Kho and @Anna Geller!
👍 2
k

Kevin Kho

03/23/2022, 3:45 PM
If you edit that code, just use the same on the agent side and it should work