Hello - I have a Q about ad-hoc env vars when runn...
# prefect-community
t
Hello - I have a Q about ad-hoc env vars when running a flow — is it possible to “save” them somehow so they can be re-used again? or do I have to refill them manually each time i wanna create an ad-hoc run?
k
Hi @Tom Klein, would the KV Store fit your use case ? You can persist them here if you use cloud
t
hmm - so you’re saying i can persist an entire dict and then inject it as the full env var suite ? is it possible to do that from the UI?
i.e. from here?
i guess what i’m asking is - what’s the best approach that we should take in order to have a set of env vars --- let’s say we wanna use some “persistent” defaults or values for most of them and then alter some of them ad-hoc when we run this flow manually (since the whole purpose of this flow is to be a manually-run backfill for old data - on demand)
k
No this section does not persist anywhere and is just for ad-hoc triggers. But you could pass the dict to the RunConfig with
Copy code
flow.run_config = RunConfig(..., env={...})
but this will be the same values across all flow runs I think you can register with this and then use the UI to override for the ad-hoc backfills
a
if I can recall, you are on AWS right? why not leverage AWS Secrets Manager or AWS Systems Manager Parameter Store?
t
@Anna Geller that helps with just fetching vars - but i’m talking about altering them from run to run (but still having most of them “persisted”) - which is created ad-hoc manually (e.g. from the UI) think - credit-card / billing-address form in the browser where it’s all getting auto-filled whenever you buy a new thing, but you can still alter the auto-filled values (e.g. if u wanna use a different credit-card in this specific purchase)
also - now that i think about it - our problem is even a bit more complex because our flow actually runs (as one of its steps) a Kubernetes job of a specific docker image, and its that internal job that needs to have the env vars, not the flow…. so we need to somehow relay them from the flow into the job
a
i’m talking about altering them from run to run (but still having most of them “persisted”)
It seems like you are looking for some external storage solution like Redis - it would allow you to do that. You shouldn't really use Prefect as a parameter store; Prefect is mostly about orchestration and execution
our flow actually runs (as one of its steps) a Kubernetes job of a specific docker image, and its that internal job that needs to have the env vars, not the flow
maybe Kubernetes secrets is the right approach here?
t
hmm, in our “normal” production case (where we just deploy services etc. unrelated to data and/or orchestration) - we do use AWS param/secret store to inject env vars into the containers, but here because it’s orchestrated by Prefect i’m not sure what the correct approach would be --- it’s the Prefect agent which is creating the job here rather than our internal k8s deployment utility i understand that prefect itself isn’t a param store - it’s just that the nature of this whole flow is to be manually run on-demand (and it orchestrates an entire DAG of operations) so obviously it would be more convenient to directly put in the params into Prefect rather than maintain them elsewhere as another “moving part”… especially because they are changing from run to run (which is , again, executed directly in prefect)
k
I am not seeing an easy way to get it into the job. Seems like you would have to modify the AWS param/secret store values from the parent flow in this case?
a
because it’s orchestrated by Prefect i’m not sure what the correct approach would be
I don't think Prefect puts any restrictions in that regard. You can totally still use parameter store or secrets manager for that, and this would be actually quite useful because it makes your code easy to migrate to Prefect 2.0
and now that you mention that those values are changing from run to run, managing those centrally from something like a parameter store would be pretty helpful to avoid "moving parts" as you would have them centrally managed. and they can even be versioned
t
right but in our regular (i.e. non-data or non-prefect) case we don’t have ad-hoc runs that are controlled with env vars (or any kind of parameter) e.g. in the service case, there’s some persistent env vars, and then when a user makes a request to the service, the service can accept ad-hoc params but the Prefect use-case is different, there’s a flow (that can be run scheduled or manually) and have let’s say various ad-hoc parameters / env-vars that change how the run is done --- and there’s no “request” or HTTP serving etc. - except that which is done from the UI or from the SDK
a
Coming back to your original question: the only way to store those environment variables would be adding those to your run configuration and reregistering your flow
t
@Anna Geller ya, that doesn’t fit the use-case 😕 we’ll need to find some solution around it anyway - i just realized that in order to inject these params/env-vars from the flow into the k8s job, we need to manually list every single env-var in the JSON that describes the k8s job spec - did i get this right? is there no better way? 😮
e.g.:
a
it's up to you. the Prefect way of ingesting env variables to your flow is via run config or setting those on the agent
t
well we don’t want to set it on the agent because they are specific to this job we’re executing, right? and you’re talking about ingesting the env-vars into the flow, i’m talking about “passing them on” to a job that’s being executed directly as a distinct k8s job via the
RunNamespacedJob
task Basically all we’re trying to do is use the flow to orchestrate several k8s jobs - each having their own image… are we going about this the wrong way? e.g. we have a process that : • pulls data from our DWH • runs some NodeJS code to do stuff with that data (e.g. scrape websites) • runs a python-based ML model on the data • exports the results to S3 and up until now we ran it manually which is error prone so we’re trying to migrate it to Prefect
a
again, it's totally up to you; all of those are valid ways of injecting env variables into your workflow. if you want to do it via Prefect, you can use env variables on the run config or on the agent. If you want to do it yourself, you can: 1. Inject them to your Kubernetes job template and reference this template on your run config 2. Set those within your Dockerfile or during Docker image build process to inject those directly into a container image 3. Retrieve those custom parameter values and secrets from some third-party parameter store or secrets backend or Prefect's KV Store. Hope that clarifies this- the rest is up to you to decide. LMK, if you have any other questions I can help with.
t
@Anna Geller hmm ok - thanks 🙏 we’ll experiment with it a bit more and see if we can reach some satisfactory path I understand that there’s a variety of options, the issue is i’m not even sure the way we’re going about this makes sense to begin with (e.g. using the raw
RunNamespacedJob
command as a way to execute a program [encapsulated in a Docker image] that was initially designed to run as a standalone process rather than as a step in an orchestrated DAG)
a
That's totally fine, Prefect can still orchestrate it, as long as you package it into a task, which you did with the Kubernetes task