Hey team what is the best practise for accessing binary file Prefect Community #ask-community

Hey team, what is the best practise for accessing ...

Espen Overbye

03/22/2021, 6:51 AM

Hey team, what is the best practise for accessing binary files from a flow? We're processing a fair amount of weather data, stored in a binary format (netcdf). We're running the kubernetes agent, on Azure, with volumes mounted on the pods running our flows with a custom yaml file. It does works but is slightly brittle

Dylan

03/22/2021, 2:48 PM

Hi @Espen Overbye! Your outlined solution seems perfectly fine to me! What aspect of your current setup do you find brittle?

Espen Overbye

03/22/2021, 2:54 PM

f.ex when testing runs, if you forget to add the path to the yaml - no mount 😉. I think we'd ideally like to have a solution where we only accessed services we could attach as task; we're currently running a mix of linux/windows for the devs, and we can't mount the azure volume locally when developing, ending up with local sync issues (not to mention linux/windows folder naming fun)

Dylan

03/22/2021, 2:55 PM

Ahh that totally makes sense

Espen Overbye

03/22/2021, 2:56 PM

we have a (too) complex readme for new devs to follow to get going ..

Dylan

03/22/2021, 2:56 PM

Does Blob Storage work well for binary formats in azure?

Dylan

03/22/2021, 2:56 PM

If you were willing to trade some I/O time for reliability, that could ensure the same access patterns between local/staging/production

Espen Overbye

03/22/2021, 2:57 PM

jepp, could work

Dylan

03/22/2021, 2:59 PM

We’re on GCP and I persist everything I use for my Flows (locally or in Production) in GCS and I find it works extremely well

Espen Overbye

03/22/2021, 2:59 PM

We could go down the path of using Hyrax(https://www.opendap.org/), a dedicated distributed solution for this type of data, but yet another piece of complexity to add

👀 1

Espen Overbye

03/22/2021, 3:00 PM

cool

👍 1

19 Views

Open in Slack

Previous Next