https://prefect.io logo
Title
j

Josh Paulin

06/17/2022, 2:42 PM
Hello. I’m trying to test out Gitlab storage by following option #2 laid out here. I can see the environment variables
PREFECT__CLOUD__USE_LOCAL_SECRETS
and
PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
set on my agent, but not on the job. Trying to run the flow just errors out at
Failed to load and execute flow run: KeyError('The secret GITLAB_ACCESS_TOKEN was not found.  Please ensure that it was set correctly in your tenant: <https://docs.prefect.io/orchestration/concepts/secrets.html>')
k

Kevin Kho

06/17/2022, 2:43 PM
Does it error out with
flow.run()
or an agent run?
j

Josh Paulin

06/17/2022, 2:45 PM
Looks like flow run. I see
ERROR: execute flow-run
in the web UI.
k

Kevin Kho

06/17/2022, 2:46 PM
How did you set these env vars on the agent?
j

Josh Paulin

06/17/2022, 2:48 PM
In the yaml for the agent deployment. relevant snippet here
containers:
      - args:
        - prefect agent kubernetes start
        command:
        - /bin/bash
        - -c
        env:
        - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-key
        ...
        - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          value: http://:8080
        - name: PREFECT__CLOUD__API_KEY
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-key
        - name: PREFECT__CLOUD__TENANT_ID
          value: ''
        - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
          value: 'true'
        - name: PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
          valueFrom:
            secretKeyRef:
              name: prefect-secrets
              key: prefect-gitlab-token
k

Kevin Kho

06/17/2022, 2:52 PM
Ah ok I see. Try doing:
prefect agent kubernetes install --env TEST=false
and you still see the format for including env variables that are passed through to the flow
j

Josh Paulin

06/17/2022, 2:55 PM
Ah so I want to include those under
PREFECT__CLOUD__AGENT__ENV_VARS
?
k

Kevin Kho

06/17/2022, 2:56 PM
Yes exactly
Those are passed through
j

Josh Paulin

06/17/2022, 2:57 PM
What does setting the token on the agent do, like the example shows?
k

Kevin Kho

06/17/2022, 3:01 PM
Which token? The Gitlab or the Auth?
j

Josh Paulin

06/17/2022, 3:02 PM
Gitlab. The post I linked seems to imply that you can set the token on the agent like I’m trying to do.
k

Kevin Kho

06/17/2022, 3:06 PM
At the moment, I don’t think that’s right because it’s the Flow pod that pulls from storage anyway so I think it needs to be there like what you’re trying
j

Josh Paulin

06/17/2022, 3:08 PM
Got it. I think for what I want option 5 of setting a custom job template will do the trick. Thanks!
Follow up question. I’m able to register and start the flow when using a custom job template (pulling from GitLab storage as expected). But the flow ultimately errors out not being able to find by job-template file when running (
[Errno 2] No such file or directory: 'job-template.yml'
). The flow code for refference
@task
def hello_task():
    logger = prefect.context.get('logger')
    <http://logger.info|logger.info>('Hello world!')


with Flow('hello-flow') as flow:
    hello_task()

flow.storage = GitLab(
    host='<HOST>',
    repo='<REPO>',
    path='flows/hello_flow.py',
    ref='<BRANCH>'
)

flow.run_config = KubernetesRun(job_template_path='job-template.yml')
flow.register(project_name='tutorial')
k

Kevin Kho

06/17/2022, 6:33 PM
This is because it’s evaluating that filepath relative to the agent pod. This is easier if you do something like
s3://
. Point being your agent needs access to load the file during run time
j

Josh Paulin

06/17/2022, 6:44 PM
I guess I’m trying to avoid having the job template live in s3 while the storage class is GitLab; seems like added complexity. Does the GitLab storage not clone the entire repository?
k

Kevin Kho

06/17/2022, 6:45 PM
But it’s the Flow pod that clones the repo. This is the agent pod that needs that info to deploy the Flow pod. Does that make sense?
It just needs to be in the Agent container then
j

Josh Paulin

06/17/2022, 6:46 PM
I’m not sure I follow. The error I’m seeing is on the flow pod.
k

Kevin Kho

06/17/2022, 6:48 PM
Can I see your traceback? You can remove sensitive info
j

Josh Paulin

06/17/2022, 6:49 PM
You mean the complete output from the flow pod?
k

Kevin Kho

06/17/2022, 6:50 PM
Sure yeah
Reading the doc string:
job_template_path (str, optional): Path to a job template to use. If a local path (no file scheme, or a file/local scheme), the job template will be loaded on initialization and stored on the KubernetesRun object as the job_template field. Otherwise the job template will be loaded at runtime on the agent. Supported runtime file schemes include (s3, gcs, and agent (for paths local to the runtime agent)).
so I guess it should have been read already when you registered?
j

Josh Paulin

06/17/2022, 6:56 PM
Yes. And I’ve seen it error out when the file path to the template is wrong or the template is malformed, so I know that’s working.
full logs from the job
[Errno 2] No such file or directory: 'job-template.yml'
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 96, in flow_run
    raise exc
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 73, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.8/site-packages/prefect/storage/gitlab.py", line 105, in get_flow
    return extract_flow_from_file(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/storage.py", line 88, in extract_flow_from_file
    exec(contents, exec_vals)
  File "<string>", line 24, in <module>
  File "/usr/local/lib/python3.8/site-packages/prefect/run_configs/kubernetes.py", line 106, in __init__
    with open(parsed.path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'job-template.yml'
k

Kevin Kho

06/17/2022, 7:13 PM
Thanks. It does seem as you mentioned. I can’t tell if that is a bug quite yet because it seems like it should be read already but is not.
Looking at this
From your registration machine, can you try:
from prefect.utilities.filesystems import parse_path
parsed = parse_path("job-template.yaml")
print(parsed.scheme)
j

Josh Paulin

06/17/2022, 7:20 PM
I get
file
k

Kevin Kho

06/17/2022, 7:21 PM
That looks good. Let me try running this code
from prefect.run_configs import KubernetesRun

a = KubernetesRun(job_template_path="test-yan.yaml")
print(a.job_template)
this is working for me. I am guessing the Gitlab script based storage just re-evaluates the file to obtain the Flow, and as a side effect of that, it tries to instantiate the KubernetesRun even though it’s already been read, you know what I mean?
Have a call brb
j

Josh Paulin

06/17/2022, 7:31 PM
Yes I think I follow. Sounds like a bug then?
k

Kevin Kho

06/17/2022, 8:18 PM
Yes I personally think so, but am unsure either how it would be patched because the Flow file is ran after being retrieved from Storage (for the Git based ones) to evaluate the Flow and then retrieve the
flow
variable. Yes the Git repo gets cloned, but it’s in a temp directory and doesn’t find these files. Maybe we can use
Git
storage instead of
Gitlab
because
Git
storage has a way to load in these files like shown here, but then the path needs to be changed. Let me think about this.
j

Josh Paulin

06/17/2022, 8:21 PM
Maybe you don’t know, but would I have the same problem if I switched to using script based storage in s3, and not explicitly supplying an s3 path for the job template?
k

Kevin Kho

06/17/2022, 8:22 PM
I think I have an idea. Can you try:
with Flow(...) as flow:
    ...
    ...

if __name__ == "__main__":
    flow.run_config = KubernetesRun(...)
    flow.register(...)
this way, it will only surely run during registration
S3 storage as script will have the same issue I think. S3 storage as pickle will not.
j

Josh Paulin

06/17/2022, 8:24 PM
That did the trick!
🎉
k

Kevin Kho

06/17/2022, 8:26 PM
Nice just don’t put executor inside the main guard. That one needs to be above because it’s not stored along with the Flow. It’s read from the storage file. Thanks for the patience!
1