Hey team, what is the best practise for accessing binary files from a flow? We're processing a fair amount of weather data, stored in a binary format (netcdf). We're running the kubernetes agent, on Azure, with volumes mounted on the pods running our flows with a custom yaml file. It does works but is slightly brittle
d
Dylan
03/22/2021, 2:48 PM
Hi @Espen Overbye!
Your outlined solution seems perfectly fine to me! What aspect of your current setup do you find brittle?
e
Espen Overbye
03/22/2021, 2:54 PM
f.ex when testing runs, if you forget to add the path to the yaml - no mount ๐. I think we'd ideally like to have a solution where we only accessed services we could attach as task; we're currently running a mix of linux/windows for the devs, and we can't mount the azure volume locally when developing, ending up with local sync issues (not to mention linux/windows folder naming fun)
d
Dylan
03/22/2021, 2:55 PM
Ahh that totally makes sense
e
Espen Overbye
03/22/2021, 2:56 PM
we have a (too) complex readme for new devs to follow to get going ..
d
Dylan
03/22/2021, 2:56 PM
Does Blob Storage work well for binary formats in azure?
Dylan
03/22/2021, 2:56 PM
If you were willing to trade some I/O time for reliability, that could ensure the same access patterns between local/staging/production
e
Espen Overbye
03/22/2021, 2:57 PM
jepp, could work
d
Dylan
03/22/2021, 2:59 PM
Weโre on GCP and I persist everything I use for my Flows (locally or in Production) in GCS and I find it works extremely well
e
Espen Overbye
03/22/2021, 2:59 PM
We could go down the path of using Hyrax(https://www.opendap.org/), a dedicated distributed solution for this type of data, but yet another piece of complexity to add