One of the tasks in a flow I'm working on is to do...
# prefect-community
t
One of the tasks in a flow I'm working on is to download a CSV file with Selenium and use the file for further processing. It works well in the local environment, but when we deploy to Prefect Cloud and try to access the file, it does not exist. Selenium doesn't give an error, so I assumed the file was downloaded. It seems Prefect Cloud uses the
/tmp/
directory as its working directory. Does this directory delete files that enter there or the file never entered the directory. I would appreciate any help.
i
I am with @Toluwani Oludare on this. Please @Jeff Hale when you get this, please assist. Our active deployment depends on this, and it keeps failing. Ideally, we will like to run the flow in the same local storage where the code lives and override prefect config to download it to the
/tmp/tmp*prefect
before running if that is possible. Thanks
To add more info • The deployment runs on an ubuntu server • It is deployed using Python prefect deployment method
j
You can use and absolute path when you specify where you want your created files to be stored on your infrastructure.
i
The problem with that is, I don't have control over that as Selenium downloads to the "working directory" where the script is executed
j
The files are downloaded to the temporary directory and then that directory is removed when the flow completes. You probably could enable result persistence and save your files that way. Check out the docs concept section on results.
i
Let me check that out, but I doubt that is the problem. I further debugged the flow by adding a delay of 10 minutes and downloading the file twice using Selenium, but it didn't show in the
/tmp/tmp*prefect
folder. Lastly, the file does not live outside the flow, there is a command to delete the file after loading in pandas in the same flow. I am guessing there is no running from using the
/tmp/tmp*prefect
as the location where the prefect flow is run, correct?
Here is what the output log looks like. This is in a flow, and the output log is from a task. The task is the one downloading and loading the csv file. The error says, it couldn't read the file
j
It looks like you can specify the folder you want selenium to download to. https://www.browserstack.com/guide/download-file-using-selenium-python
I’m not getting the TMPDIR environment variable to work and don’t see where that would override the location in the source code. Will enquire further. 🤔
@Ifeoluwa Daranijo, got an option to set the working directory. ️ Create a Process Block and set the Working Directory to whatever directory you like. Specify that Process block in your deployment.
i
Hey Jeff I had a break, and just got back. Thanks for your suggestions, I will try them out now😎
Hey @Jeff Hale the process block works fine. @Toluwani Oludare was able to set it up using prefect python process config
🙌 1
t
Thanks @Jeff Hale for the assistance ❤️
🙌 1
j
Great! Sorry it slipped my mind at first.