Question I am running a flow and have it saving a file to a Prefect Community #best-practices

Question, I am running a flow and have it saving a...

John Kang

08/01/2022, 9:40 PM

Question, I am running a flow and have it saving a file to a relative path. When I check the relative path on my local filesystem the file is not updated when the flow executes from a Prefect agent. I tried it again with a remote filesystem and it does not update the files on the remote filesystem. I checked on the server and the blocks holding the credentials for both local and remote both state that the block has read and write privileges. Any idea on what I can do to actually have the file update the local (when the deployment is run using the local file system) and remote (when the deployment is run using the remote file system)?

John Kang

08/01/2022, 9:45 PM

I checked the directory I am in and the CWD and it looks like when I execute a flow it uses a temp folder. It seems like this is created when the flow is deployed to the server. Is there a way to default the local filesystem to use the actual local filesystem? When I check online the block that holds the data knows the basepath of the actual filesystem (see second picture).

Anna Geller

08/02/2022, 1:30 AM

I didn't understand the problem - which storage did you want to use in the end - local, remote or GCS? Could you share the deployment command you were using? It may also help if you share a link to your GitHub repo showing your project structure

Anna Geller

08/02/2022, 1:46 AM

Another question: if you want to create a local file system block, could you try without a block created from the UI? I believe doing "prefect deployment build flow.py:flow --name xxx" will already address the issue as it will automatically infer the right path by default

John Kang

08/02/2022, 11:27 AM

@Anna Geller I did both local and GCS. The issue is that whenever a flow is run from either one of these deployments it uses the files in the temp directory. This is fine if I am using it to just run a flow, but as part of the flow I modify some of the files within that project. The only files that get modified are within that temp folder. Also, to your second point I did create the deployment without creating a local infrastructure block, the picture I showed you is the anonymous block auto created when creating a local filesystem deployment. It has the right path, the issue is that it does not use it when running the flow.

Anna Geller

08/02/2022, 11:35 AM

Could you share the deployment command you were using? It may also help if you share a link to your GitHub repo showing your project structure

John Kang

08/02/2022, 1:32 PM

@Anna Geller Sure thing. Local deployment command:

prefect deployment build ./main_python_files/w_wrapper_update_data.py:capacity-flow -n capacity-deployment -t test

GCS remote deployment command:

prefect deployment build ./main_python_files/w_wrapper_update_data.py:wrapper_data_update_function -n capacity-deployment -t capacity -t sql -t cockroachdb --storage-block gcs/gcs-socal

I've shared screenshots of the project structure (project is on github but I am not supposed to share as it has some corporate data on it). At a high level the data from SQL folder within the project structure holds some intermediate files that are updated during the execution of the flow. The problem as I've outlined is that when the flow executes from a deployment (local or remote) it uses the temporary directory's files which are replicas of the original files. This is a problem as some of the intermediate data is historic, so if I run these deployments month on month it will not capture this historic data. I think what I have to do to get around this issue (or maybe it is a feature of Prefect as it does allow deployments to be run on other machines easier) is to separate the flow from the data. I'm going to try to save the locally referenced data to GCS and change the references to local data to point to GCS to load and save data.

Khuyen Tran

08/02/2022, 3:36 PM

Hi @John Kang John, can you try an absolute path instead of a relative path?

John Kang

08/02/2022, 4:22 PM

@Khuyen Tran Interesting question! I think it will be the same result as the deployment creates a temporary directory. I don't think this is necessarily bad. I'm working through uploading all of the files to GCS and then redirecting all of the pulls to GCS instead of the local filesystem.

Khuyen Tran

08/02/2022, 5:38 PM

I asked this question because a user couldn’t save the file locally until he changed the relative path to absolute path so this might help

Khuyen Tran

08/02/2022, 5:38 PM

But let me know if the new solution works for you

593 Views

Open in Slack

Previous Next