Tom Augspurger
10/14/2020, 5:56 PMDaskKubernetesEnvironment
environment with a custom scheduler_spec_file
/ worker_spec_file
, and GitHub
storage together?
For pangeo-forge, we don't want our users to worry about things like storage / execution environments if they don't need to, so we provide a default: https://github.com/pangeo-forge/pangeo-forge/pull/14/files#diff-467822c6f6378f68bea635c429827a2caf36c7f16cb25944cc7b5146262cf35aR32-R68. Users just write their pipeline and push it to GitHub (e.g. https://github.com/TomAugspurger/example-pipeline/blob/main/recipe/pipeline.py#L30-L41).
When I register and run a flow with this setup, I notice that my custom spec files aren't being used (defined at https://github.com/pangeo-forge/pangeo-forge/pull/14/files#diff-267b30d97c826b0afcae2110fe8ca4acfe6f35a6321d80f5fcc74ea9b7547fc0). We just need to update the ServiceAccount
to be pangeo-forge
rather than default
. So my questions would be:
1. Is it common to use DaskKubernetesEnvironment and GitHub storage, rather than Docker storage?
2. Any suggestions on debugging why my custom spec files aren't being used? When I used Docker
storage they were used (but I've changed other things too).kkkkkkkjosh
10/14/2020, 6:02 PMDaskKubernetesEnvironment
they belong to are not attached to the flow object. (unless I’m missing something) When using file-based storage the flow is loaded from that file at runtime and whatever attributes are present on that object are the ones that are used. So if that flow doesn’t have the DaskKubernetesEnvironment
when loaded from the file it won’t use it