https://prefect.io logo
f

Felipe Saldana

01/27/2021, 5:31 PM
Hello all, I was wondering if someone can point me in the right direction. (Please let me know if I need to clarify 🙂) I have this working as a POC in Dagster and was wondering how this example would work in Prefect This would be testing a custom task class ... multiple instances of a smaller task ... configurable params/secrets I have a legacy module that accepts 20+ parameters. Some of these parameters will be required (name), some will need to be secrets (tokens), and some vary (description). Depending on the combination of these parameters the module will do specific things: Query a database and dump to a bucket, grab data from a bucket and load in the db, and other tasks. Instead of accepting command line args, I want to "prefect-ify" that module and supply some of the configuration directly into that module. Would this be a combination of parameters and secrets? Can these be loaded from a toml file? Next, I want to wrap that base module into a smaller/specific task: example: query_db_and_dump_to_bucket(). This smaller task will have required values(db host, db username, db pass, table_name). Just to point out these values are not required in the base module. In my flow, I would want to call query_db_and_dump_to_bucket() again and again using different tables names.
e

emre

01/27/2021, 6:11 PM
If I understand correctly, the behavior of your monolithic module changes according to the presence or value of its input parameters. Lets assume you have each of these 20+ parameters as some 20+ tasks output. I would have a task that takes all these parameters and by whatever custom logic, decides what should be done and returns the decision as a string. For instance, you would return
'query_and_dump'
if that is what you wanna do. Use this decision as key to a
switch
, this way you can have a seperate task for each single responsibility your monolithic module had, and pick whatever one you want to be run at runtime. https://docs.prefect.io/api/0.12.6/tasks/control_flow.html#functions
About moving away from cli args, whatever really works for you 😅 . You can embed values into the toml config, but parameters can have default values anyways, so I would prefer to have them exposed as a parameter. Changing a value in the toml would require redeploying stuff. For secrets, use the secret tasks. If you are on prefect cloud, you can register secrets directly to the prefect UI. If not, just subclass the
Secret
class to access your secret store. Only difference between a
Secret
and a
Task
is that
Secret
tasks are configured to not cache their results at all.
f

Felipe Saldana

01/27/2021, 6:29 PM
So it sounds like I should use Parameter for some of the values as I see a required flag (and default values). I dont see a required flag for Secret though.
I am trying to wrap my head around how does the flow/task know that a Secret is required without a dev looking directly at the code. As opposed to a Parameter I can see in the UI
e

emre

01/27/2021, 6:47 PM
so secrets, at least in prefect aren’t something you are supposed to be providing on a per run basis. They just exist in a secure place, and a flow retrieves all the secrets it needs for a run.
f

Felipe Saldana

01/27/2021, 6:50 PM
ok, so I would create all my secrets in the flow and pass around as needed?
e

emre

01/27/2021, 6:53 PM
Yeah I would just get all the secrets required for each case. Is there any concern with extracting secrets that you wont need in a certain run?
Also, do you use the secrets to decide what your module should do? Or are they only used during the actual job of the module.
f

Felipe Saldana

01/27/2021, 6:54 PM
no, no concern just trying to figure out how this works.
some secrets will be used by only certain task though ... I cant think of anything else besides that.
say if I had 10 secrets needed for my flow and another dev teammate wants to run my flow. They would have to look at the code to determine all the secrets that are needed?
...meaning there is not functionality that says "you must provide these secrets" to run this flow.
e

emre

01/27/2021, 6:59 PM
Yeah pretty much, you can always put the secrets needed, and which operation needs which secrets in a readme or somehing.
I think you are looking at secrets a lot like parameters. They are different. Parameters are tasks that initiate the flow, so it makes sense that they are required or the flow wont work. Secret tasks on the other hand are regular tasks. They can have upstream and downstream tasks
f

Felipe Saldana

01/27/2021, 7:03 PM
yes, you are correct in my thoughts of secrets vs parameters. But I do think it would be helpful to know (outside of looking directly at the code) which secrets are required/needed.
e

emre

01/27/2021, 7:03 PM
Meaning you can have an upstream task to your secret task skip, and never attempt to extract that secret. In your case, you can put only the secret tasks you need downstream to your decision and
switch
statement. So only the secrets required by the actual operation you want are extracted
👍 1
I understand, which secrets are required when is the type of question I would just answer in some readme 😅 .
f

Felipe Saldana

01/27/2021, 7:10 PM
Yes, a readme would be a good step. It would be a great step if the "flow could tell you" 😃
e

emre

01/27/2021, 7:17 PM
I mean, in the UI the flow “tells” you which tasks it has, search for “Secret” in task name and you kinda have what secrets the task uses. You just need to be consistent with naming tasks, make sure that all secret tasks contain the word “secret” in its name. The UI even shows the dependency graph. So you would actually know which secret is needed under what condition.
I get the feeling these are all “hacky” compared to what you would like though 😅
f

Felipe Saldana

01/27/2021, 8:11 PM
I think we may have exhausted this topic 😅 but to try to put it simply. If I use a PrefectSecret in my Flow/Task, my flow will not run unless its provided (in other words its required) ... my feeling is the flow should "know" before failing at runtime. Anyways thats all I got on this for now 😆