j

    Jacques Jamieson

    2 years ago
    can Prefect be used for building a processing pipeline that pulls data from various REST api's and then perform analysis on that data. can it also support this in a multi-tenant architecture?
    nicholas

    nicholas

    2 years ago
    Hi @Jacques Jamieson ! Prefect can definitely be used for the pipeline you’ve described; in fact the task dependency structure makes it uniquely suited to that sort of work. Can you expand on the multi tenant architecture question a bit?
    j

    Jacques Jamieson

    2 years ago
    with regard to multi tenancy, I have somewhere in the space of 100-1000 of clients that need to ahve data pulled from the same REST api using the individual clients api credentials and perform the same analyse. Ive tested Prefect and was able to get a single client flow working as a prototype and get a feel for how the pipeline operates.
    nicholas

    nicholas

    2 years ago
    Ahh, ok, thank you. So if I understand correctly, you want to be able to create a flow that processes a set of data from some APIs but you want to use different credentials to access the APIs each time the flow is run. If that's the case, you could build references to the credentials as flow parameters. You could then use those references to get Secrets that would hold different client credentials. Something like this:
    @task
    def get_credentials(client):
      return Secret(f"{client}_API_TOKEN").get()
    
    with Flow("Data Processing Flow") as flow:
      client_ref = Parameter("client", required=True)
      credentials = get_credentials(client_ref)
    
      # do something with the credentials downstream
    Or you could even use that
    Secret().get()
    method in the relevant API tasks
    And then when you kick off each run, you would just pass the client reference as a runtime parameter 🙂
    j

    Jacques Jamieson

    2 years ago
    ahh right yeah, so the idea is that every 15 or so minutes depending what each tenant has configured. Prefect would fetch all the clients for the current execution time period along with there credentials, then for each client kick of the task that pulls the data from the APIs and performs the analyses.
    the part thats not clear is given the list of credentials, how should this list of credentials be looped over in Prefect. is a standard python loop good to go?
    with Flow("Data Processing Flow") as flow:
      client_ref = Parameter("client", required=True)
      credentials = get_credentials(client_ref)
      # do something with the credentials downstream
      for credential in credentials:
          # call_api(credential)
    nicholas

    nicholas

    2 years ago
    Even better, @Jacques Jamieson, you can use Prefect's native mapping:
    with Flow("Data Processing Flow") as flow:
      client_ref = Parameter("client", required=True)
      credentials = get_credentials(client_ref)
      # do something with the credentials downstream
      call_api.map(credential)
    For something like you described you could use iterated mapping really nicely:
    with Flow("Data Processing Flow") as flow:
      # get list of clients
      clients = get_clients()
      
      # get credentials for each client
      credentials = get_credentials.map(clients)
    
      # call api for each set of client credentials
      call_api.map(credentials)
    j

    Jacques Jamieson

    2 years ago
    Aaahh, this is perfect. Thanks so much for getting back to me! I'll give it a go and see how get along.