https://prefect.io logo
j

Jacques Jamieson

05/20/2020, 3:49 AM
can Prefect be used for building a processing pipeline that pulls data from various REST api's and then perform analysis on that data. can it also support this in a multi-tenant architecture?
n

nicholas

05/20/2020, 12:18 PM
Hi @Jacques Jamieson ! Prefect can definitely be used for the pipeline you’ve described; in fact the task dependency structure makes it uniquely suited to that sort of work. Can you expand on the multi tenant architecture question a bit?
j

Jacques Jamieson

05/20/2020, 10:11 PM
with regard to multi tenancy, I have somewhere in the space of 100-1000 of clients that need to ahve data pulled from the same REST api using the individual clients api credentials and perform the same analyse. Ive tested Prefect and was able to get a single client flow working as a prototype and get a feel for how the pipeline operates.
n

nicholas

05/20/2020, 10:26 PM
Ahh, ok, thank you. So if I understand correctly, you want to be able to create a flow that processes a set of data from some APIs but you want to use different credentials to access the APIs each time the flow is run. If that's the case, you could build references to the credentials as flow parameters. You could then use those references to get Secrets that would hold different client credentials. Something like this:
Copy code
@task
def get_credentials(client):
  return Secret(f"{client}_API_TOKEN").get()

with Flow("Data Processing Flow") as flow:
  client_ref = Parameter("client", required=True)
  credentials = get_credentials(client_ref)

  # do something with the credentials downstream
👍 1
Or you could even use that
Secret().get()
method in the relevant API tasks
And then when you kick off each run, you would just pass the client reference as a runtime parameter 🙂
👍 1
j

Jacques Jamieson

05/21/2020, 3:36 AM
ahh right yeah, so the idea is that every 15 or so minutes depending what each tenant has configured. Prefect would fetch all the clients for the current execution time period along with there credentials, then for each client kick of the task that pulls the data from the APIs and performs the analyses.
the part thats not clear is given the list of credentials, how should this list of credentials be looped over in Prefect. is a standard python loop good to go?
Copy code
with Flow("Data Processing Flow") as flow:
  client_ref = Parameter("client", required=True)
  credentials = get_credentials(client_ref)
  # do something with the credentials downstream
  for credential in credentials:
      # call_api(credential)
n

nicholas

05/21/2020, 3:45 AM
Even better, @Jacques Jamieson, you can use Prefect's native mapping:
Copy code
with Flow("Data Processing Flow") as flow:
  client_ref = Parameter("client", required=True)
  credentials = get_credentials(client_ref)
  # do something with the credentials downstream
  call_api.map(credential)
For something like you described you could use iterated mapping really nicely:
Copy code
with Flow("Data Processing Flow") as flow:
  # get list of clients
  clients = get_clients()
  
  # get credentials for each client
  credentials = get_credentials.map(clients)

  # call api for each set of client credentials
  call_api.map(credentials)
j

Jacques Jamieson

05/21/2020, 5:34 AM
Aaahh, this is perfect. Thanks so much for getting back to me! I'll give it a go and see how get along.