d

    Darren

    1 month ago
    I'm still new to tools such as Prefect. I am trying to automate our onboarding process which is pulling(api/json) employee information from one source and create accounts into 3 other applications via api/json. My thought process would be creating a flow to pull the data and check to see if the accounts exists, if they don't create them. My question is: Would it be better to pass the data between the tasks or place the data into a source such as a file(csv maybe?) or database?
    John Kang

    John Kang

    1 month ago
    Darren, I have an overall flow with sub-tasks that all pass data to each other, read and write .csv files to Google Cloud storage (GCS), plus read and write from a database. It all depends on what I need for the data. If I just need the data to pull right into another subsequent task I just pass it in. If I need to archive the data then I might put the data into GCS or a database. If I need to access the data with another application I put the data into a database (Cockroachdb).
    d

    Darren

    1 month ago
    does data size matter? At most, my data shouldn't be over 1mb in total
    John Kang

    John Kang

    1 month ago
    Good question! With the size of that data if you are just passing data to another task I would just keep it as a variable and pass it over. <1mb data honestly you could do either of the three options I presented.
    d

    Darren

    1 month ago
    Awesome, I appreciate you taking the time in answering my question. Thank you.
    John Kang

    John Kang

    1 month ago
    All good Darren! Enjoy and let me know if you have any other questions.
    One thing to add, you probably already know this but if you are bandwidth constrained I would pass data between tasks and upload and download data only when necessary.
    d

    Darren

    1 month ago
    I didn't consider that but that's not a factor in this instance. Thank you for pointing that out