james.lamb
06/18/2020, 7:56 PMKubernetesAgent
(https://github.com/PrefectHQ/prefect/pull/2796). I'm struggling with something and hoping someone can help.
I have the following setup:
1. Flow code uses KubernetesJobEnvironment
+ S3
storage, using flow.register()
to register flows with Prefect Cloud
2. Running a KubernetesAgent
I can see that the agent is successfully communicating with Prefect Cloud. When I run a flow from the Prefect Cloud UI, I can see it in the agent's logs and I see a Kubernetes job created. That job is now failing with this error
[2020-06-18 19:44:54] INFO - prefect.S3 | Downloading hello-flow-w-s3-k8s-env/2020-06-18t19-18-00-707752-00-00 from prefect-d94f436a-25b1-1699546c3
... big stacktrace ...
botocore.exceptions.NoCredentialsError: Unable to locate credentialsMy flow code is just the hello world example that prints to the
prefect
logger, so that error must come from the prefect
code that is trying to pull the flow from storage in S3. In the manifest for the KubernetesAgent
, I've set up the environment based on the directions in https://docs.prefect.io/core/concepts/secrets.html#default-secrets
env:
- name: PREFECT__CONTEXT__SECRETS__AWS_CREDENTIALS
value: '{"ACCESS_KEY": "REDACTED", "SECRET_ACCESS_KEY": "REDACTED"}'
I expected that setting this on the agent would mean that every job it creates has access to those credentials to download the flow. What am I doing wrong? Happy to provide more context in thread.prefect
0.12.0. The KubernetesJobEnvironment
looks like this:
env = KubernetesJobEnvironment(
job_spec_file="prefect-flow-run.yaml",
metadata={
"image": "prefecthq/prefect:all_extras-0.12.0"
}
)
and prefect-flow-run.yaml
is exactly the content from https://docs.prefect.io/orchestration/execution/k8s_job_environment.html#job-spec-configuration# peg to a specific state of the Prefect repo
COMMIT_TO_USE="4ac1c8c14e0e85c470e552f5ebab3e07a5891ff4"
REGISTRY_URL="localhost:32000"
IMAGE_TAG="localhost:32000/prefect/prefect-test:${COMMIT_TO_USE:0:6}"
PREFECT_INSTALL_DIR=$(pwd)/prefect-temp
mkdir -p ${PREFECT_INSTALL_DIR}
pushd ${PREFECT_INSTALL_DIR}
git clone git@github.com:PrefectHQ/prefect.git
pushd prefect
git checkout ${COMMIT_TO_USE}
docker build \
--build-arg PYTHON_VERSION=3.7 \
--build-arg GIT_SHA=${COMMIT_TO_USE} \
--build-arg BUILD_DATE="today" \
--build-arg PREFECT_VERSION=${COMMIT_TO_USE} \
--tag ${IMAGE_TAG} \
-f Dockerfile \
.
docker push ${IMAGE_TAG}
popd
popd
Laura Lorenz (she/her)
06/18/2020, 8:12 PMprefect-flow-run.yaml
. Do you mind trying that?
That being said, if I understood correctly I think you are a Prefect Cloud user, you can also use the Cloud secret store to hold on to your credentials if you want; then you don’t need to set the environment variable everywhere as it will be grabbed from the Cloud secret store at runtime as long as you specify the secret name on your storage. That type of setup is describe more here: https://docs.prefect.io/orchestration/recipes/third_party_auth.html#declaring-secrets-on-storagejames.lamb
06/18/2020, 8:26 PMKubernetesJobEnvironment
, because I don't want my credentials bundled into storage and put into S3. I'm trying to keep AWS creds completely inside my infrastructure.
I'm comfortable putting those credentials into an environment variable in the manifest for the agent, since that agent is something I'm deploying and its details never leave the cluster. It sounds like I'm going to have to find a different pattern for the jobs created for flow runs.josh
06/18/2020, 8:43 PMPREFECT__CONTEXT__SECRETS__AWS_CREDENTIALS
on your job specjames.lamb
06/18/2020, 9:03 PMDocker
storage but I'm back to trying S3
storage today. I understand a lot more about Prefect now than I did when I opened this thread originally, and I think I can ask a more precise question.
I'm using this combination
• using flow.register()
with Prefect Cloud
• storage: S3
• agent: KubernetesAgent
• environment: KubernetesJobEnvironment
• executor: LocalExecutor
I understand now that triggering a flow run will kick off this sequence of events:
1. Agent finds a new flow run, gets details from Prefect Cloud
2. Agent creates a k8s job ("Job 1") which run deserialize the storage and run storage.get_flow()
to get the flow.
3. Job 1 inspects that flow it just got and decides which environment to run it in based on flow.environment
. In my case, this will create another k8s job ("Job 2") using the job spec I defined in KubernetesJobEnvironment
, with some other stuff overridden / added (https://github.com/PrefectHQ/prefect/blob/e9da231c1fdc94988b66dd9b336fc98316ecc097/src/prefect/environments/execution/k8s/job.py#L155)
4. Job 2 then runs the tasks in the flow using the executor I provided. In this case I used LocalExecutor
Here's the issue I'm facing....I cannot figure out how to get AWS creds into "Job 1", and it is failing with the same NoCredentialsError I mentioned above. I see this override option for IMAGE_PULL_SECRETS
(https://github.com/PrefectHQ/prefect/blob/e9da231c1fdc94988b66dd9b336fc98316ecc097/src/prefect/agent/kubernetes/agent.py#L166) but I can't figure out how to mount in any other secrets. The other suggestions in this thread seem to be about "Job 2"
Am I right that this is a gap that's a result of the fact that until very recently the KubernetesAgent
always just used Docker
storage?Jim Crist-Harif
07/07/2020, 10:27 PMjames.lamb
07/07/2020, 10:35 PM_job_spec
seems to already be stored on the flow), but when I tried to test that today I ran into this, so I think KubernetesAgent
+ S3
storage is still not a possible combination.
Would you consider a PR that adds a recognized environment variable for secrets (maybe a dictionary with secret name + mount point) here? https://github.com/PrefectHQ/prefect/blob/e9da231c1fdc94988b66dd9b336fc98316ecc097/src/prefect/agent/kubernetes/agent.py#L165-L187
or is that too weird and hacky?Jim Crist-Harif
07/07/2020, 10:37 PMjames.lamb
07/07/2020, 10:51 PMI'd rather just give them the k8s spec directly and let them template it as they see fitTotally agree with this!
for a given agent, would you need different secrets for different flows? Or the same for all flows run by that agent?I'm really unsure about this one. For the immediate use case in front of me I think it would be fine for all flows run by an agent to have the same secrets, at least in what I called "Job 1". In that "Job 2", where you're actually doing
flow.run()
, I think it's desirable for individual users' secrets to be different, based on what they configure in their KubernetesJobEnvironment
.
I feel like I need to give a lot more thought to what it looks like for a team of 10 people, for example, to all work on their own flows within a single Prefect Cloud tenant. I feel like it's ok for all of them to share one agent and for their flows to be stored in one place (like one S3 bucket), but it is probably NOT ok for all the "Job 2"s to share the same creds, since you and I might work on the same team but be working on projects we can't talk to each other about (like different consulting clients or something).KubernetesJobEnvironment
allows total customization, so people can mount in their own secrets with their own access to databases and things for "Job 2".
Maybe places that want total isolation all the way through coud have admins in their Prefect Cloud tenant create one agent per user, generate a UUID as a label, and then give the user that label and say "stick this on your flows and keep it secret". Then only the tenant admin would ever have a view of all agents, and you could guarantee one agent per user.Jim Crist-Harif
07/07/2020, 11:20 PMKubernetesAgent
configurable.james.lamb
07/08/2020, 2:21 PM