One of the tasks in a flow I m working on is to download a C Prefect Community #ask-community

One of the tasks in a flow I'm working on is to do...

Toluwani Oludare

04/12/2023, 6:41 PM

One of the tasks in a flow I'm working on is to download a CSV file with Selenium and use the file for further processing. It works well in the local environment, but when we deploy to Prefect Cloud and try to access the file, it does not exist. Selenium doesn't give an error, so I assumed the file was downloaded. It seems Prefect Cloud uses the

/tmp/

directory as its working directory. Does this directory delete files that enter there or the file never entered the directory. I would appreciate any help.

Ifeoluwa Daranijo

04/13/2023, 8:56 AM

I am with @Toluwani Oludare on this. Please @Jeff Hale when you get this, please assist. Our active deployment depends on this, and it keeps failing. Ideally, we will like to run the flow in the same local storage where the code lives and override prefect config to download it to the

/tmp/tmp*prefect

before running if that is possible. Thanks

Ifeoluwa Daranijo

04/13/2023, 9:19 AM

To add more info • The deployment runs on an ubuntu server • It is deployed using Python prefect deployment method

Jeff Hale

04/13/2023, 12:07 PM

You can use and absolute path when you specify where you want your created files to be stored on your infrastructure.

Ifeoluwa Daranijo

04/13/2023, 12:47 PM

The problem with that is, I don't have control over that as Selenium downloads to the "working directory" where the script is executed

Jeff Hale

04/13/2023, 1:01 PM

The files are downloaded to the temporary directory and then that directory is removed when the flow completes. You probably could enable result persistence and save your files that way. Check out the docs concept section on results.

Ifeoluwa Daranijo

04/13/2023, 1:11 PM

Let me check that out, but I doubt that is the problem. I further debugged the flow by adding a delay of 10 minutes and downloading the file twice using Selenium, but it didn't show in the

/tmp/tmp*prefect

folder. Lastly, the file does not live outside the flow, there is a command to delete the file after loading in pandas in the same flow. I am guessing there is no running from using the

/tmp/tmp*prefect

as the location where the prefect flow is run, correct?

Ifeoluwa Daranijo

04/13/2023, 1:22 PM

Here is what the output log looks like. This is in a flow, and the output log is from a task. The task is the one downloading and loading the csv file. The error says, it couldn't read the file

Jeff Hale

04/13/2023, 1:59 PM

It looks like you can specify the folder you want selenium to download to. https://www.browserstack.com/guide/download-file-using-selenium-python

Jeff Hale

04/13/2023, 2:15 PM

Found what looks like a way to set the temp directory path: https://prefect-community.slack.com/archives/CL09KU1K7/p1677614773173219?thread_ts=1673458911.027309&cid=CL09KU1K7

Jeff Hale

04/13/2023, 2:38 PM

I’m not getting the TMPDIR environment variable to work and don’t see where that would override the location in the source code. Will enquire further. 🤔

Jeff Hale

04/13/2023, 2:42 PM

@Ifeoluwa Daranijo, got an option to set the working directory. ⭐️ Create a Process Block and set the Working Directory to whatever directory you like. Specify that Process block in your deployment.

Ifeoluwa Daranijo

04/13/2023, 5:19 PM

Hey Jeff I had a break, and just got back. Thanks for your suggestions, I will try them out now😎

Ifeoluwa Daranijo

04/13/2023, 6:11 PM

Hey @Jeff Hale the process block works fine. @Toluwani Oludare was able to set it up using prefect python process config

🙌 1

Toluwani Oludare

04/13/2023, 6:12 PM

Thanks @Jeff Hale for the assistance ❤️

🙌 1

Jeff Hale

04/13/2023, 6:20 PM

Great! Sorry it slipped my mind at first.

11 Views

Open in Slack

Previous Next