Hello. I’m trying to test out Gitlab storage by fo...
# prefect-community
j
Hello. I’m trying to test out Gitlab storage by following option #2 laid out here. I can see the environment variables
PREFECT__CLOUD__USE_LOCAL_SECRETS
and
PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
set on my agent, but not on the job. Trying to run the flow just errors out at
Copy code
Failed to load and execute flow run: KeyError('The secret GITLAB_ACCESS_TOKEN was not found.  Please ensure that it was set correctly in your tenant: <https://docs.prefect.io/orchestration/concepts/secrets.html>')
k
Does it error out with
flow.run()
or an agent run?
j
Looks like flow run. I see
ERROR: execute flow-run
in the web UI.
k
How did you set these env vars on the agent?
j
In the yaml for the agent deployment. relevant snippet here
Copy code
containers:
      - args:
        - prefect agent kubernetes start
        command:
        - /bin/bash
        - -c
        env:
        - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-key
        ...
        - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          value: http://:8080
        - name: PREFECT__CLOUD__API_KEY
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-key
        - name: PREFECT__CLOUD__TENANT_ID
          value: ''
        - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
          value: 'true'
        - name: PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-gitlab-token
k
Ah ok I see. Try doing:
Copy code
prefect agent kubernetes install --env TEST=false
and you still see the format for including env variables that are passed through to the flow
j
Ah so I want to include those under
PREFECT__CLOUD__AGENT__ENV_VARS
?
k
Yes exactly
Those are passed through
j
What does setting the token on the agent do, like the example shows?
k
Which token? The Gitlab or the Auth?
j
Gitlab. The post I linked seems to imply that you can set the token on the agent like I’m trying to do.
k
At the moment, I don’t think that’s right because it’s the Flow pod that pulls from storage anyway so I think it needs to be there like what you’re trying
j
Got it. I think for what I want option 5 of setting a custom job template will do the trick. Thanks!
Follow up question. I’m able to register and start the flow when using a custom job template (pulling from GitLab storage as expected). But the flow ultimately errors out not being able to find by job-template file when running (
[Errno 2] No such file or directory: 'job-template.yml'
). The flow code for refference
Copy code
@task
def hello_task():
    logger = prefect.context.get('logger')
    <http://logger.info|logger.info>('Hello world!')


with Flow('hello-flow') as flow:
    hello_task()

flow.storage = GitLab(
    host='<HOST>',
    repo='<REPO>',
    path='flows/hello_flow.py',
    ref='<BRANCH>'
)

flow.run_config = KubernetesRun(job_template_path='job-template.yml')
flow.register(project_name='tutorial')
k
This is because it’s evaluating that filepath relative to the agent pod. This is easier if you do something like
s3://
. Point being your agent needs access to load the file during run time
j
I guess I’m trying to avoid having the job template live in s3 while the storage class is GitLab; seems like added complexity. Does the GitLab storage not clone the entire repository?
k
But it’s the Flow pod that clones the repo. This is the agent pod that needs that info to deploy the Flow pod. Does that make sense?
It just needs to be in the Agent container then
j
I’m not sure I follow. The error I’m seeing is on the flow pod.
k
Can I see your traceback? You can remove sensitive info
j
You mean the complete output from the flow pod?
k
Sure yeah
Reading the doc string:
Copy code
job_template_path (str, optional): Path to a job template to use. If a local path (no file scheme, or a file/local scheme), the job template will be loaded on initialization and stored on the KubernetesRun object as the job_template field. Otherwise the job template will be loaded at runtime on the agent. Supported runtime file schemes include (s3, gcs, and agent (for paths local to the runtime agent)).
so I guess it should have been read already when you registered?
j
Yes. And I’ve seen it error out when the file path to the template is wrong or the template is malformed, so I know that’s working.
full logs from the job
Copy code
[Errno 2] No such file or directory: 'job-template.yml'
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 96, in flow_run
    raise exc
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 73, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.8/site-packages/prefect/storage/gitlab.py", line 105, in get_flow
    return extract_flow_from_file(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/storage.py", line 88, in extract_flow_from_file
    exec(contents, exec_vals)
  File "<string>", line 24, in <module>
  File "/usr/local/lib/python3.8/site-packages/prefect/run_configs/kubernetes.py", line 106, in __init__
    with open(parsed.path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'job-template.yml'
k
Thanks. It does seem as you mentioned. I can’t tell if that is a bug quite yet because it seems like it should be read already but is not.
Looking at this
From your registration machine, can you try:
Copy code
from prefect.utilities.filesystems import parse_path
parsed = parse_path("job-template.yaml")
print(parsed.scheme)
j
I get
file
k
That looks good. Let me try running this code
Copy code
from prefect.run_configs import KubernetesRun

a = KubernetesRun(job_template_path="test-yan.yaml")
print(a.job_template)
this is working for me. I am guessing the Gitlab script based storage just re-evaluates the file to obtain the Flow, and as a side effect of that, it tries to instantiate the KubernetesRun even though it’s already been read, you know what I mean?
Have a call brb
j
Yes I think I follow. Sounds like a bug then?
k
Yes I personally think so, but am unsure either how it would be patched because the Flow file is ran after being retrieved from Storage (for the Git based ones) to evaluate the Flow and then retrieve the
flow
variable. Yes the Git repo gets cloned, but it’s in a temp directory and doesn’t find these files. Maybe we can use
Git
storage instead of
Gitlab
because
Git
storage has a way to load in these files like shown here, but then the path needs to be changed. Let me think about this.
j
Maybe you don’t know, but would I have the same problem if I switched to using script based storage in s3, and not explicitly supplying an s3 path for the job template?
k
I think I have an idea. Can you try:
Copy code
with Flow(...) as flow:
    ...
    ...

if __name__ == "__main__":
    flow.run_config = KubernetesRun(...)
    flow.register(...)
this way, it will only surely run during registration
S3 storage as script will have the same issue I think. S3 storage as pickle will not.
j
That did the trick!
🎉
k
Nice just don’t put executor inside the main guard. That one needs to be above because it’s not stored along with the Flow. It’s read from the storage file. Thanks for the patience!
1