John Kang
08/01/2022, 9:40 PMAnna Geller
08/02/2022, 1:30 AMJohn Kang
08/02/2022, 11:27 AMAnna Geller
08/02/2022, 11:35 AMJohn Kang
08/02/2022, 1:32 PMprefect deployment build ./main_python_files/w_wrapper_update_data.py:capacity-flow -n capacity-deployment -t test
GCS remote deployment command: prefect deployment build ./main_python_files/w_wrapper_update_data.py:wrapper_data_update_function -n capacity-deployment -t capacity -t sql -t cockroachdb --storage-block gcs/gcs-socal
I've shared screenshots of the project structure (project is on github but I am not supposed to share as it has some corporate data on it). At a high level the data from SQL folder within the project structure holds some intermediate files that are updated during the execution of the flow. The problem as I've outlined is that when the flow executes from a deployment (local or remote) it uses the temporary directory's files which are replicas of the original files. This is a problem as some of the intermediate data is historic, so if I run these deployments month on month it will not capture this historic data.
I think what I have to do to get around this issue (or maybe it is a feature of Prefect as it does allow deployments to be run on other machines easier) is to separate the flow from the data. I'm going to try to save the locally referenced data to GCS and change the references to local data to point to GCS to load and save data.Khuyen Tran
08/02/2022, 3:36 PMJohn Kang
08/02/2022, 4:22 PMKhuyen Tran
08/02/2022, 5:38 PM