Hi all, I’m a new Prefect user setting up a serve...
# prefect-community
r
Hi all, I’m a new Prefect user setting up a serverless orchestration stack over AWS. The ETLs my company runs sometimes download and process terabytes of data so we’re setting up our ephemeral containers to use Amazon’s Elastic File Storage to flexibly scale according to needs. The actual flow storage is a Github Storage Block. The problem is we don’t really understand how Prefect sets up the Github storage block, and hence where it downloads data from an ETL flow. We need to know the latter so we can mount EFS directly to that directory. The ETLs are set up to store data in a
datasets
directory of the parent directory enclosing the
flows
directory - e.g. if flows live in
parent_dir/flows
, store data in
parent_dir/datasets
. But it’s not clear if
parent_dir
itself is copied over by Prefect or where it would live in absolute terms if so —
/opt/prefect/data
? Any advice or wisdom from the community?
1
k
I have some flows that download files to the root of my repository and I use git storage as well. your current working directory when the flow runs should be the root of the repo, so it should be safe to point to files assuming that from within your code. sorry if that doesn't answer your question effectively, your use case seems more complex than mine
r
OK, that makes sense. How is the repo itself stored on a container Prefect spins up?
/opt/prefect/data/my_repo
?
/my_repo
?
~/my_repo
?
k
the default behavior of the github storage block is to clone the repo into the present working directory
it's probably opt/prefect/ since that's the entrypoint for the image but I'm not entirely sure
r
Based on your feedback and this issue https://github.com/PrefectHQ/prefect/issues/2861 I’m thinking the same
so then
/opt/prefect/my_repo/
k
yeah
r
Thank you! this is very helpful
I’ll report back once I get it working
k
awesome! np