Hi everyone I discovered Prefect a few weeks ago and i ve be Prefect Community #prefect-getting-started

Hi everyone ! I discovered Prefect a few weeks ago...

Milou Van Cau

12/10/2022, 8:16 PM

Hi everyone ! I discovered Prefect a few weeks ago and i’ve been struggling since a few days with the saving of my csv files. I’m doing everything locally for the moment and this line of code does not work when i launch my program.

<http://DATA_FULL.to|DATA_FULL.to>_csv("csv_files/DATA_FULL.csv")

But when i run it without prefect it works fine. does anyone has an idea of what i have to change ?

✅ 1

Peyton Runyan

12/10/2022, 11:16 PM

Can you see if supplying an absolute path fixes it? And can you supply a full stack trace of the error? They make it way easier for us to troubleshoot

Milou Van Cau

12/11/2022, 12:44 AM

I tried already with the absolute path and it didn’t change anything 😞 and regarding the error, i just don’t have one because the codes just passes over the line of code. when i try a wrong path it gives me a error but with the good path it is doing nothing which is really weird.

Andrew Huang

12/11/2022, 2:20 AM

This works for me

Copy code

from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test_flow():
    df = pd.DataFrame({"a": [1, 2, 3]})
    path = Path("csv_files/DATA_FULL.csv")
    path.parent.mkdir(exist_ok=True)
    df.to_csv(path)

test()

this too:

Copy code

from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("DATA_FULL.csv")
test()

Jeff Hale

12/11/2022, 3:28 AM

What’s the result of running

prefect version

from the command line, @Milou Van Cau?

Milou Van Cau

12/11/2022, 10:40 AM

unfortunately not, i tried both ways and the code still seems to be skipping the to_csv commands 😕

Milou Van Cau

12/11/2022, 1:00 PM

Version:       2.7.0

API version: 0.8.3 Python version: 3.10.6 Git commit: c2833339 Built: Thu, Dec 1, 2022 4:03 PM OS/Arch: darwin/x86_64 Profile: default Server type: hosted

Jeff Hale

12/11/2022, 1:52 PM

Thank you, Milou. My best guess is that it’s a Windows path parsing issue. If you do

<http://df.to|df.to>_csv("DATA_FULL.csv")

- so not in a subfolder - does that work?

Milou Van Cau

12/11/2022, 1:53 PM

Thanks jeff, tried it also and it didn't work. I'm actually using a linux computer. Might that be the issue ?

Jeff Hale

12/11/2022, 2:09 PM

lol. Sorry, too early in the AM. Linux should be fine. This is strange. Can you share your flow code, including the function call?

Milou Van Cau

12/11/2022, 2:10 PM

Yep

Milou Van Cau

12/11/2022, 2:20 PM

Copy code

from prefect import flow, get_run_logger
from main import update_funda
import numpy as np

@flow
def main_flow():
    results = update_funda()
    logger = get_run_logger()
    <http://logger.info|logger.info>(f'THE RESULTS >> Fit time = {np.sum(results["fit_time"])}    Test score = {np.mean(results["test_score"])}')

if __name__ == "__main__":
    main_flow()

this function calls this one

Copy code

def update_funda():
    df = start_scraping_funda()
    print(f"scraper {df.shape}")
    df = indexer_funda(df)
    print(f"indexer {df.shape}  ✅ Saved NL_DATA_FULL")
    df = cleaner_funda(df)
    print(f"cleaner {df.shape}  ✅ Saved NL_DATA_FULL_cleaned")
    df = encoder_funda(df)
    print(f"encoder {df.shape}")
    print("Running model")
    results = model_funda(df)
    print("run checker")
    df = checker_funda()
    df.to_csv("csv_files/funda_csv/DATA_FULL.csv")
    return results

the whole code works fine but at the end, the to_csv function is not executed and nothing is saved

Milou Van Cau

12/11/2022, 2:24 PM

When the main_flow is runned from VScode, everything is working fine. The problem only appears when i run it from the orion interface.

Jeff Hale

12/11/2022, 2:48 PM

Thank you.

When the main_flow is runned from VScode, everything is working fine.

Are you running it from VScode like this:

python main_flow

? And by “from the orion interface” - do you mean running a deployment from the UI? I can’t reproduce with

python my_file.py.

My attempt:

Copy code

from prefect import flow
import pandas as pd


def do_it():
    df = pd.DataFrame({"a": [1, 2, 3]})
    return df


def make_csv():
    df = do_it()
    df.to_csv("sub/DATA_FULL.csv")


@flow
def test():
    make_csv()


if __name__ == "__main__":
    test()

Milou Van Cau

12/11/2022, 3:23 PM

I just ran the code with the run python file button of VScode

Milou Van Cau

12/11/2022, 3:24 PM

I'll try in an houre with the command you gave and let you know

Milou Van Cau

12/11/2022, 4:51 PM

I just executed the code with Python -c 'from main import update_funda; update_funda()' And it works also. The issue really only happen when i run it with the UI interface 😢

Milou Van Cau

12/11/2022, 5:12 PM

i just tried also on another computer, a macbook with the following setup:

Copy code

from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("DATA_FULL.csv")


if __name__ == "__main__":
    test()

and the same result happened.

Jeff Hale

12/11/2022, 5:21 PM

Ok. So it works with Prefect from the cli, that's good. Running from the UI is something I always do after making a deployment - just running the deployment. I can investigate more tomorrow.

Milou Van Cau

12/11/2022, 5:23 PM

Perfect, thanks for the help ! let me know if you need anything else.

Jeff Hale

12/11/2022, 5:40 PM

I feel like initiating the flow from the UI outside a deployment and writing to a file system just might not work. I would just put it in a deployment.

Jeff Hale

12/11/2022, 5:40 PM

Or just run from the cli if you don't plan to make it a repeatable, scheduleable type thing.

Milou Van Cau

12/11/2022, 5:42 PM

That's the issue, i used prefect in order to automate the running every 3hours. I am already putting it into production

Jeff Hale

12/11/2022, 5:43 PM

Ok. Can you run the deployment? I think try the absolute path as part of it, too

Milou Van Cau

12/11/2022, 5:48 PM

I tried with the absolute path as well as the short part in deployment and none work. The issue is really only happening when i use the UI interface

Jeff Hale

12/11/2022, 6:06 PM

If you run the deployment with

deployment run

from the cli it doesn't work, does it?

Jeff Hale

12/11/2022, 6:07 PM

What does you deployment build command or Python file look like?

Milou Van Cau

12/11/2022, 6:17 PM

Copy code

###
### A complete description of a Prefect Deployment for flow 'main-flow'
###
name: Funda
description: null
version: a7811d689d2cc02c955ccce13dad9b22
# The work queue that will handle this deployment's runs
work_queue_name: default
tags:
- emile
parameters: {}
schedule: null
infra_overrides: {}
infrastructure:
  type: process
  env: {}
  labels: {}
  name: null
  command: null
  stream_output: true
  working_dir: null
  block_type_slug: process
  _block_type_slug: process

###
### DO NOT EDIT BELOW THIS LINE
###
flow_name: main-flow
manifest_path: null
storage: null
path: /home/jeroen/code/Jstach22/Seren-IT
entrypoint: flow.py:main_flow
parameter_openapi_schema:
  title: Parameters
  type: object
  properties: {}
  required: null
  definitions: null

Milou Van Cau

12/11/2022, 6:18 PM

this is how my build looks like (.yaml to be precise)

Milou Van Cau

12/11/2022, 6:23 PM

i executed the

prefect deployment run

the script runs but the csv is still not saved indeed

Jeff Hale

12/11/2022, 6:31 PM

Did you recreate the deployment and apply it when you changed to the absolute path?

Milou Van Cau

12/11/2022, 6:33 PM

yep, tried the two ways and none worked

Jeff Hale

12/12/2022, 3:35 PM

Hi Milou, The path looks good. Did you also apply the deployment?

Milou Van Cau

12/12/2022, 3:35 PM

yes, both

Ryan Peden

12/12/2022, 3:48 PM

Hi Milou! When you run a flow as part of a deployment, it runs in a temporary directory that gets deleted after the flow run. I know you mentioned you tried an absolute path, but just to confirm, what happens if you change your code to:

Copy code

from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("/home/jeroen/code/Jstach22/Seren-IT/DATA_FULL.csv")

if __name__ == "__main__":
    test()

Milou Van Cau

12/12/2022, 5:00 PM

Hey ! it finally works !! i tried using the complete path put manually and not with pathlib and it worked on the Linux. still doesn’t work this way on the Macbook but i’ll continue trying. Thanks a lot for the help !

🙌 1

3 Views

Open in Slack

Previous Next