https://prefect.io logo
d

Darren

08/04/2022, 3:45 PM
I'm still new to tools such as Prefect. I am trying to automate our onboarding process which is pulling(api/json) employee information from one source and create accounts into 3 other applications via api/json. My thought process would be creating a flow to pull the data and check to see if the accounts exists, if they don't create them. My question is: Would it be better to pass the data between the tasks or place the data into a source such as a file(csv maybe?) or database?
1
j

John Kang

08/04/2022, 4:35 PM
Darren, I have an overall flow with sub-tasks that all pass data to each other, read and write .csv files to Google Cloud storage (GCS), plus read and write from a database. It all depends on what I need for the data. If I just need the data to pull right into another subsequent task I just pass it in. If I need to archive the data then I might put the data into GCS or a database. If I need to access the data with another application I put the data into a database (Cockroachdb).
gratitude thank you 1
d

Darren

08/04/2022, 4:46 PM
does data size matter? At most, my data shouldn't be over 1mb in total
j

John Kang

08/04/2022, 4:47 PM
Good question! With the size of that data if you are just passing data to another task I would just keep it as a variable and pass it over. <1mb data honestly you could do either of the three options I presented.
1
d

Darren

08/04/2022, 4:48 PM
Awesome, I appreciate you taking the time in answering my question. Thank you.
j

John Kang

08/04/2022, 4:51 PM
All good Darren! Enjoy and let me know if you have any other questions.
One thing to add, you probably already know this but if you are bandwidth constrained I would pass data between tasks and upload and download data only when necessary.
d

Darren

08/04/2022, 5:52 PM
I didn't consider that but that's not a factor in this instance. Thank you for pointing that out
🙌 1
8 Views