Hi Team, I'm working to define a custom job spec t...
# ask-community
z
Hi Team, I'm working to define a custom job spec to be referenced via a KubernetesRun config in my flow, but running into a few issues when the custom job spec gets kicked off via a Kubernetes Agent. 1. The job is not being tracked via Prefect Cloud. It gets scheduled and I can see the job execute to completion in Kubernetes but Prefect Cloud never reports beyond the scheduled status. 2. The Flow itself never gets executed and the only logs are the CLI printout of the main
prefect --help
results.
Job Definiton:
Copy code
apiVersion: batch/v1
kind: Job
metadata:
  name: prefect-job-spec
  labels: {}
spec:
  template:
    metadata:
      labels: {}
    spec:
      restartPolicy: Never
      containers:
        - name: flow
          image: prefecthq/prefect:latest
          imagePullPolicy: IfNotPresent
          command: ["/bin/sh", "-c"]
          args: ["prefect execute flow-run"]
          env:
            - name: PREFECT__LOGGING__LEVEL
              value: "INFO"
            - name: PREFECT__CLOUD__USE_LOCAL_SECRETS
              value: "false"
            - name: PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS
              value: "prefect.engine.cloud.CloudFlowRunner"
            - name: PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS
              value: "prefect.engine.cloud.CloudTaskRunner"
          volumeMounts:
            - name: secret-volume
              mountPath: /etc/secret-volume
              readOnly: true
      volumes:
        - name: secret-volume
          secret:
            secretName: prefect-secret
The following labels are automatically attached to the Pod created for the Job from my custom job spec:
Copy code
<http://prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
<http://prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
<http://prefect.io/identifier=xxxxxxxx|prefect.io/identifier=xxxxxxxx>
Results of Job Execution:
Copy code
Usage: prefect [OPTIONS] COMMAND [ARGS]...
  The Prefect CLI for creating, managing, and inspecting your flows.
  Note: a Prefect Cloud API token is required for all Cloud related commands. If a token
  is not set then run `prefect auth login` to set it.
  Query Commands:
      get         List high-level object information
      describe    Retrieve detailed object descriptions
  Action Commands:
      agent       Manage agents
      create      Create objects
      delete      Delete objects
      execute     Execute a flow's environment
      run         Run a flow
      register    Register flows with an API
      heartbeat   Send heartbeats for a run
  Setup Commands:
      auth        Handle Prefect Cloud authorization
      backend     Switch between `server` and `cloud` backends
      server      Interact with the Prefect Server
  Miscellaneous Commands:
      version     Print the current Prefect version
      config      Output Prefect config
      diagnostics Output Prefect diagnostic information
Options:
  -h, --help  Show this message and exit.
Commands:
  agent     Manage Prefect agents.
  build     Build one or more flows.
  register  Register one or more flows into a project
k
Hi @Zach Hodowanec! This looks a bit similar to this . Though not entirely. Do you see similarities?
There is also helpful stuff in that thread (--show-flow-logs or setting the log level to debug to get more info)
z
@Kevin Kho I added an additional arg to my job spec to include
--show-flow-logs
as well as setting
PREFECT__LOGGING__LEVEL = DEBUG
. Unfortunately, I get the same
prefect --help
results I posted previously. I see no evidence that my flow ever actually starts it's execution.
k
What are you Prefect versions for Flow and image?
z
0.14.17
t
Hi Zach, can you verify that the job-spec you've defined above matches the created job's definition? I'm a bit confused, as it seems that your container is just calling
prefect
and not
prefect execute flow-run
z
Hi @Tyler Wanner, I agree that it appears the container is only executing
prefect
but can confirm the created job definition does in fact include the
execute flow-run
args as well. Here's a clip from the pod for my job:
Copy code
Name:           prefect-job-e3b61173-fgzfh
 Namespace:      prefect-demo
 Priority:       0
 Node:           docker-desktop/xxx.xxx.xx.x
 Start Time:     Wed, 12 May 2021 11:34:13 -0600
 Labels:         <http://app.kubernetes.io/instance=prefect-agent|app.kubernetes.io/instance=prefect-agent>
                 <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                 <http://app.kubernetes.io/name=prefect-agent|app.kubernetes.io/name=prefect-agent>
                 <http://app.kubernetes.io/version=0.14.17-python3.8|app.kubernetes.io/version=0.14.17-python3.8>
                 controller-uid=a8ded4b8-e0b5-4aa4-8b10-523b3fea5981
                 <http://helm.sh/chart=prefect-agent-0.1.0-dev|helm.sh/chart=prefect-agent-0.1.0-dev>
                 job-name=prefect-job-e3b61173
                 <http://prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
                 <http://prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
                 <http://prefect.io/identifier=xxxxxxxx|prefect.io/identifier=xxxxxxxx>
 Annotations:    <none>
 Status:         Pending
 IP:
 IPs:            <none>
 Controlled By:  Job/prefect-job-e3b61173
 Containers:
   flow:
     Container ID:
     Image:         prefecthq/prefect:0.14.17-python3.8
     Image ID:
     Port:          <none>
     Host Port:     <none>
     Command:
       /bin/sh
       -c
     Args:
       prefect
       execute
       flow-run
Not sure if it helps, but this is the example from the Prefect GitHub page that I based my job spec on. The only real additions to mine are adding volume mounts. https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/kubernetes/job_spec.yaml
@Tyler Wanner @Kevin Kho Here's a full run synopsis of what I see given all previous information. `kubectl describe job`:
Copy code
Name:           prefect-job-xxxxxxxx
Namespace:      prefect-deploy-keys
Selector:       controller-uid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Labels:         <http://app.kubernetes.io/instance=prefect-agent|app.kubernetes.io/instance=prefect-agent>
                <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                <http://app.kubernetes.io/name=prefect-agent|app.kubernetes.io/name=prefect-agent>
                <http://app.kubernetes.io/version=0.14.17-python3.8|app.kubernetes.io/version=0.14.17-python3.8>
                <http://helm.sh/chart=prefect-agent-0.1.0-dev|helm.sh/chart=prefect-agent-0.1.0-dev>
                <http://prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
                <http://prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
                <http://prefect.io/identifier=xxxxxxxx|prefect.io/identifier=xxxxxxxx>
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Wed, 12 May 2021 16:03:17 -0600
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  <http://app.kubernetes.io/instance=prefect-agent|app.kubernetes.io/instance=prefect-agent>
           <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
           <http://app.kubernetes.io/name=prefect-agent|app.kubernetes.io/name=prefect-agent>
           <http://app.kubernetes.io/version=0.14.17-python3.8|app.kubernetes.io/version=0.14.17-python3.8>
           controller-uid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
           <http://helm.sh/chart=prefect-agent-0.1.0-dev|helm.sh/chart=prefect-agent-0.1.0-dev>
           job-name=prefect-job-xxxxxxxx
           <http://prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
           <http://prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
           <http://prefect.io/identifier=xxxxxxxx|prefect.io/identifier=xxxxxxxx>
  Containers:
   flow:
    Image:      prefecthq/prefect:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
    Args:
      prefect
      execute
      flow-run
    Environment:
      PREFECT__LOGGING__LEVEL:                      INFO
      PREFECT__BACKEND:                             cloud
      PREFECT__CLOUD__AGENT__LABELS:                ['my-label']
      PREFECT__CLOUD__API:                          <https://api.prefect.io>
      PREFECT__CLOUD__AUTH_TOKEN:                   ****
      PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
      PREFECT__CONTEXT__FLOW_RUN_ID:                xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
      PREFECT__CONTEXT__FLOW_ID:                    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
      PREFECT__CONTEXT__IMAGE:                      prefecthq/prefect:latest
      PREFECT__LOGGING__LOG_TO_CLOUD:               true
      PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
      PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
    Mounts:
      /etc/secret-volume from secret-volume (ro)
  Volumes:
   secret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prefect-agent
    Optional:    false
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  3s    job-controller  Created pod: prefect-job-xxxxxxxx-j5tgc
`kubectl describe pod prefect-job-xxxxxxxx-j5tgc`:
Copy code
Name:         prefect-job-xxxxxxxx-j5tgc
Namespace:    prefect-deploy-keys
Priority:     0
Node:         docker-desktop/000.000.00.0
Start Time:   Wed, 12 May 2021 16:03:17 -0600
Labels:       <http://app.kubernetes.io/instance=prefect-agent|app.kubernetes.io/instance=prefect-agent>
              <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
              <http://app.kubernetes.io/name=prefect-agent|app.kubernetes.io/name=prefect-agent>
              <http://app.kubernetes.io/version=0.14.17-python3.8|app.kubernetes.io/version=0.14.17-python3.8>
              controller-uid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
              <http://helm.sh/chart=prefect-agent-0.1.0-dev|helm.sh/chart=prefect-agent-0.1.0-dev>
              job-name=prefect-job-xxxxxxxx
              <http://prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
              <http://prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|prefect.io/flow_run_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
              <http://prefect.io/identifier=xxxxxxxx|prefect.io/identifier=xxxxxxxx>
Annotations:  <none>
Status:       Succeeded
IP:           00.0.0.00
IPs:
  IP:           00.0.0.00
Controlled By:  Job/prefect-job-xxxxxxxx
Containers:
  flow:
    Container ID:  <docker://26adbbea01570fcac0015dc04c11de355f0cc3bd77c01bae969047127b631c6>2
    Image:         prefecthq/prefect:latest
    Image ID:      <docker-pullable://prefecthq/prefect@sha256:79a59032175275a19ede749ce1512b2fafc59a6e6b105d38ef074a0ce6c4332f>
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      prefect
      execute
      flow-run
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 12 May 2021 16:03:20 -0600
      Finished:     Wed, 12 May 2021 16:03:20 -0600
    Ready:          False
    Restart Count:  0
    Environment:
      PREFECT__LOGGING__LEVEL:                      INFO
      PREFECT__BACKEND:                             cloud
      PREFECT__CLOUD__AGENT__LABELS:                ['my-label']
      PREFECT__CLOUD__API:                          <https://api.prefect.io>
      PREFECT__CLOUD__AUTH_TOKEN:                   ****
      PREFECT__CLOUD__USE_LOCAL_SECRETS:            false
      PREFECT__CONTEXT__FLOW_RUN_ID:                xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
      PREFECT__CONTEXT__FLOW_ID:                    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
      PREFECT__CONTEXT__IMAGE:                      prefecthq/prefect:latest
      PREFECT__LOGGING__LOG_TO_CLOUD:               true
      PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudFlowRunner
      PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS:  prefect.engine.cloud.CloudTaskRunner
    Mounts:
      /etc/secret-volume from secret-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xxxxx (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  secret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prefect-agent
    Optional:    false
  default-token-nj6rp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xxxxx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  5s    default-scheduler  Successfully assigned prefect-deploy-keys/prefect-job-xxxxxxxx-j5tgc to docker-desktop
  Normal  Pulled     2s    kubelet            Container image "prefecthq/prefect:latest" already present on machine
  Normal  Created    2s    kubelet            Created container flow
  Normal  Started    2s    kubelet            Started container flow
`kubectl get logs -p prefect-job-xxxxxxxx-j5tgc`:
Copy code
Getting you a shell in flow...

Usage: prefect [OPTIONS] COMMAND [ARGS]...

  The Prefect CLI for creating, managing, and inspecting your flows.

  Note: a Prefect Cloud API token is required for all Cloud related commands. If a token
  is not set then run `prefect auth login` to set it.

  Query Commands:
      get         List high-level object information
      describe    Retrieve detailed object descriptions

  Action Commands:
      agent       Manage agents
      create      Create objects
      delete      Delete objects
      execute     Execute a flow's environment
      run         Run a flow
      register    Register flows with an API
      heartbeat   Send heartbeats for a run

  Setup Commands:
      auth        Handle Prefect Cloud authorization
      backend     Switch between `server` and `cloud` backends
      server      Interact with the Prefect Server

  Miscellaneous Commands:
      version     Print the current Prefect version
      config      Output Prefect config
      diagnostics Output Prefect diagnostic information

Options:
  -h, --help  Show this message and exit.

Commands:
  agent     Manage Prefect agents.
  build     Build one or more flows.
  register  Register one or more flows into a project.
k
This is so weird. I’ll try to dig more into it tonight and get back to you tom.
z
Thanks @Kevin Kho. If someone could even confirm the example y'all give on GitHub functions properly (without my volume mounts) that'd be a step in the right direction.
FWIW - if this existed it's quite possible I wouldn't have these questions 😉 https://github.com/PrefectHQ/prefect/issues/4406
t
Sorry for the delay, I was shaving an extremely hairy yak. I was able to reproduce your example, and resolve it by removing the
command: ["/bin/sh", "-c"]
from my job spec
👍 1
I've opened this Issue to try and get something documented or enhanced around this. I hope that just removing htat line from your custom job template should work, though https://github.com/PrefectHQ/prefect/issues/4525
z
@Tyler Wanner removing
command: ["/bin/sh", "-c"]
from my job spec does appear to resolve the issue. Thanks!
Now that I've got the job executing I have one more issue I was hoping you might have some insight on. I am working to migrate a flow from using GH Personal Access Token authentication to using GH Deploy Keys/SSH instead, but receiving this error:
Copy code
Failed to load and execute Flow's environment: FileNotFoundError(2, "No such file or directory: 'ssh'")
I have tried mounting in my private keys to the following locations but don't seem to be having any luck. •
/etc/ssh/secret-volume
/ssh/secret-volume
/ssh
Do you know what the correct location to mount the key into is?
k
Are you on KubernetesRun and Git Storage?
z
Yep, and added the
use_ssh=true
flag in the Git Storage definition.
k
@Zach Angell
z
@Zach Hodowanec I'm doing some digging to see how this would work for k8s jobs. We outsource all the git logic to
dulwich
https://www.dulwich.io/, and I'm having trouble following how they configure it. FWIW on my local machine,
dulwich
correctly checks
/Users/zangell/.ssh/id_rsa
Any chance permissions on the
/ssh
directory are restricted in your case?
z
I did check permissions on the
/ssh
directory in the pod but it matches my local machine config so I'm not sure if that's the problem
z
Are you able to load the flow from storage on your local machine?
z
I'm not sure if I understand what you're asking me to test.
z
Mimicking the configuration on your local machine config (which it seems like you're already doing) can you try running the following in a python script?
Copy code
# ... flow set up, etc
storage = Git(..., use_ssh=True)
storage.add_flow(flow) # the Flow object
storage.get_flow(flow.name) # try to load the flow from git storage
z
Ah, gotcha. Yep, running something similar to that returns the following results on my local machine.
Copy code
Enumerating objects: 34, done.
Counting objects: 100% (34/34), done.
Compressing objects: 100% (24/24), done.
Total 34 (delta 9), reused 28 (delta 6), pack-reused 0
Checking out xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This is the full stack trace from the failed job. I did notice at the bottom there was a
/tmp
directory referenced, but if I try to mount the SSH keys in that directory the final line of the stack trace references a
/var/tmp
directory instead.
Copy code
[Errno 2] No such file or directory: 'ssh': 'ssh'
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 90, in flow_run
    raise exc
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 67, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.7/site-packages/prefect/storage/git.py", line 122, in get_flow
    clone_depth=self.clone_depth,
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/git.py", line 48, in __enter__
    source=self.git_clone_url, target=self.temp_dir.name, depth=self.clone_depth
  File "/usr/local/lib/python3.7/site-packages/dulwich/porcelain.py", line 476, in clone
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/dulwich/porcelain.py", line 1559, in fetch
    fetch_result = client.fetch(path, r, progress=errstream.write, depth=depth)
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 528, in fetch
    depth=depth,
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1009, in fetch_pack
    proto, can_read, stderr = self._connect(b"upload-pack", path)
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1659, in _connect
    self.host, argv, port=self.port, username=self.username, **kwargs
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1522, in run_command
    stderr=subprocess.PIPE,
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ssh': 'ssh'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/weakref.py", line 648, in _exitfunc
    f()
  File "/usr/local/lib/python3.7/weakref.py", line 572, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/usr/local/lib/python3.7/tempfile.py", line 797, in _cleanup
    _shutil.rmtree(name)
  File "/usr/local/lib/python3.7/shutil.py", line 485, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/usr/local/lib/python3.7/shutil.py", line 483, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpc427y2_n'
z
Hmmm okay the full trace is helpful, thanks! I think that last error is a red herring - it's trying to cleanup the repo files, but since the clone never succeeded there's no directory to cleanup
I'll try to reproduce on my end this afternoon
👍 1
z
I was able to reproduce this issue using the same Flow + SSH Key combo used earlier to test loading the Flow from storage on my local machine and the
prefecthq/prefect:latest
Docker container. 1. Create Flow
Copy code
import os

from prefect import Flow, Parameter, task
from prefect.run_configs import KubernetesRun
from prefect.storage import Git


@task(log_stdout=True)
def say_hello(name):
    print("Hello, {}!".format(name))


with Flow("Hello World") as flow:
    thename = Parameter("name")
    say_hello(thename)


storage = Git(
    repo="my/repo",
    flow_path="src/flows/hello_world.py",
    branch_name="my/branch-name",
    use_ssh=True,
)

storage.add_flow(flow) # the Flow object
storage.get_flow(flow.name) # try to load the flow from git storage
2. Publish Flow to GitHub 3. Run Docker Image
Copy code
$ docker run --rm -it --entrypoint sh -v path/to/flow:/src -v ~/.ssh:/root/.ssh prefecthq/prefect:latest
4. Debug Flow
Copy code
# python /src/my_test_file.py
Traceback (most recent call last):
  File "/src/TestCloningRepoWithSSH.py", line 26, in <module>
    storage.get_flow(flow.name) # try to load the flow from git storage
  File "/usr/local/lib/python3.7/site-packages/prefect/storage/git.py", line 122, in get_flow
    clone_depth=self.clone_depth,
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/git.py", line 48, in __enter__
    source=self.git_clone_url, target=self.temp_dir.name, depth=self.clone_depth
  File "/usr/local/lib/python3.7/site-packages/dulwich/porcelain.py", line 476, in clone
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/dulwich/porcelain.py", line 1559, in fetch
    fetch_result = client.fetch(path, r, progress=errstream.write, depth=depth)
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 528, in fetch
    depth=depth,
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1009, in fetch_pack
    proto, can_read, stderr = self._connect(b"upload-pack", path)
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1659, in _connect
    self.host, argv, port=self.port, username=self.username, **kwargs
  File "/usr/local/lib/python3.7/site-packages/dulwich/client.py", line 1522, in run_command
    stderr=subprocess.PIPE,
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ssh': 'ssh'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/weakref.py", line 648, in _exitfunc
    f()
  File "/usr/local/lib/python3.7/weakref.py", line 572, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/usr/local/lib/python3.7/tempfile.py", line 797, in _cleanup
    _shutil.rmtree(name)
  File "/usr/local/lib/python3.7/shutil.py", line 485, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/usr/local/lib/python3.7/shutil.py", line 483, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp9goeajxv'
I have tried mounting the
id_rsa
,
id_rsa.pub
, and
known_hosts
files from my local machine into the following locations in the Docker container... •
/.ssh
/ssh
/etc/.ssh
/etc/ssh
/root/.ssh
/root/ssh
/home/user/.ssh
/home/user/ssh
...none seem to work so far
z
Does one of the restrictions from this thread work for your use case? https://stackoverflow.com/questions/18136389/using-ssh-keys-inside-docker-container
z
Admittedly I didn't read the whole article, but is something 7years old really relevant? Most of the thread relates to Docker Build while my issue is relative to Docker Run
Was the implementation of Deploy Keys ever tested with the Prefect container?
z
My mistake on docker build vs run, just trying to provide info on where docker is expecting the keys to live. From what I can tell
/root/.ssh
is correct.
Admittedly no, I wrote the
Git
storage class and I'm not familiar with configuring ssh keys for Docker. I assumed it was straightforward. I'll do some testing today to see if I can get it working. If I can't, I'll file a bug.
Looking at the image
prefecthq/prefect:latest
, actually I don't think it will work. We don't every
apt-get install ssh-client
, which would explain the
ssh
not found error.
I'll file a bug for that today. Should be a simple fix but I'm not sure if there's downstream implications.
👍 1
z
@Zach Angell did an issue get created for this?
k
Yep it was here .
👍 1
z
@Zach Angell what's the status of this?
z
@Zach Hodowanec Our security team is concerned about the implications of including ssh in our base image. Instead, we're going to add documentation for adding ssh to an new image using
prefecthq/prefect
as a base image. None of the documentation or process is Prefect specific, just generic instructions on setting up ssh + Docker
z
Ok, what's the timeline there and can you confirm you will have Kubernetes tests performed on those instructions? I've created my own workaround while waiting on y'all and find that the solution works great in the container itself, but not when kicked off via Kubernetes agent as there are extra Prefect commands being issued over my commands. I even added a simple entrypoint script to just sleep, but somehow the flow was attempting to pull from GH.
z
Timeline is hopefully within the next week or two. I'll be sure to test any instructions with kubernetes as well
🤞 1
Hey @Zach Hodowanec - we're still ironing out official recommendations for the docs, but I have a minimum working example for you to start with. First thing we'll need is a Docker image with ssh. Here's my
Dockerfile
Copy code
FROM prefecthq/prefect:latest
RUN apt update && apt install -y openssh-client
Next, we'll need to use that image in our run config, here's my whole flow including my run config
Copy code
from prefect import Flow, Parameter, task
from prefect.run_configs import KubernetesRun
from prefect.storage import Git

@task(log_stdout=True)
def say_hello(name):
    print("Hello, {}!".format(name))

with Flow("Hello World") as flow:
    thename = Parameter("name")
    say_hello(thename)

storage = Git(
    repo="zangell44/single-prefect-flow",
    flow_path="flow2.py",
    use_ssh=True,
)
flow.run_config = KubernetesRun(image=<my-image>)
flow.storage = storage
flow.register("test")
Finally, we'll need to do some configuration in kubernetes to make the ssh key and known_hosts files available. Step 1: Create a Kubernetes secret with the ssh key (
id_ed25519
for me) and known hosts file.
kubectl create secret generic my-ssh-key --from-file=id_ed25519=/path/to/id_ed25519 --from-file=known_hosts=/path/to/known_hosts
Step 2: Create a custom job template to mount the secret to
/root/.ssh
. Here's my custom job template yaml
Copy code
apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
        - name: flow
          volumeMounts:
            - name: ssh-key
              readOnly: true
              mountPath: "/root/.ssh"
      volumes:
        - name: ssh-key
          secret:
            secretName: my-ssh-key
            optional: false
            defaultMode: 0600
Step 3: Finally, we'll need to configure our agent to use the custom job template on startup. (This can also be done via run config)
prefect agent kubernetes start --job-template /path/to/job_template.yaml
We would also recommend configuring a service account to permission the secret properly. You can provide the service account name either on agent start or on the run config https://docs.prefect.io/orchestration/agents/kubernetes.html#service-account There's definitely a few improvements we'd like to make to simplify this process going forward and we're happy to hear feedback!
z
@Zach Angell This is very similar to the workaround I had previously tried implementing while waiting on y'all to fix the Docker image. The only difference was that I had installed
git
instead of
openssh-client
. Unfortunately neither solution actually works as I continue to receive the following error:
Copy code
Failed to load and execute Flow's environment: HangupException('Host key verification failed.\r')
Do you have any additional ideas for making Deploy Keys useable in the Prefect Docker Image?
z
Hmm that error usually indicates the
known_hosts
file is not configured correctly. I ran into that error if the
known_hosts
file was missing entirely
I would double check
/root/.ssh/known_hosts
exists and includes the host you're trying to clone from (e.g. github.com)
z
Double checking
/root/.ssh/known_hosts
isn't really possible given how quickly the job fails and Kubernetes spinning up a new Pod to resolve the issue. Any suggestions for achieving this? I did mount the same keys into my Agent to verify the Helm deployment and everything checks out there. I would imagine that it's also mounted properly in the Job spec as it's using the same mechanism for mounting the keys. I also verified that mounting the exact same keys to the exact same location of the exact same image results in a successful pull of my flow repository when just running the Docker image outside of Kubernetes. Have you successfully used Deploy Keys + Kubernetes Agent?
z
Yeah following my write up above I was able to use Deploy Keys + Kubernetes agent correctly. I’ll do some digging, I’m not sure what’s causing that error though
z
Ok
z
https://github.com/PrefectHQ/prefect/pull/4351 allows you to keep the K8s resources around a bit longer if that would help you debug
upvote 1
z
So that's a great addition, but doesn't help debug this issue at all as you are unable to exec into a container in a completed pod; which is the case here as the clone step is what fails.
z
Could you not use the image you're using for this with a non Git storage flow that just sleeps so you can inspect the state of your container?
z
Yes I could, and I just did that. I get the same
Permission denied (publickey)
error I get when running a flow attempting to utilize Git storage. While I was in the Pod I decided to generate a new Deploy Key with the values stored at
/root/.ssh/ssh-publickey
but continue to get the same error.