https://prefect.io logo
Title
m

Milou Van Cau

12/10/2022, 8:16 PM
Hi everyone ! I discovered Prefect a few weeks ago and i’ve been struggling since a few days with the saving of my csv files. I’m doing everything locally for the moment and this line of code does not work when i launch my program.
<http://DATA_FULL.to|DATA_FULL.to>_csv("csv_files/DATA_FULL.csv")
But when i run it without prefect it works fine. does anyone has an idea of what i have to change ?
1
p

Peyton Runyan

12/10/2022, 11:16 PM
Can you see if supplying an absolute path fixes it? And can you supply a full stack trace of the error? They make it way easier for us to troubleshoot
m

Milou Van Cau

12/11/2022, 12:44 AM
I tried already with the absolute path and it didn’t change anything 😞 and regarding the error, i just don’t have one because the codes just passes over the line of code. when i try a wrong path it gives me a error but with the good path it is doing nothing which is really weird.
a

Andrew Huang

12/11/2022, 2:20 AM
This works for me
from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test_flow():
    df = pd.DataFrame({"a": [1, 2, 3]})
    path = Path("csv_files/DATA_FULL.csv")
    path.parent.mkdir(exist_ok=True)
    df.to_csv(path)

test()
this too:
from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("DATA_FULL.csv")
test()
j

Jeff Hale

12/11/2022, 3:28 AM
What’s the result of running
prefect version
from the command line, @Milou Van Cau?
m

Milou Van Cau

12/11/2022, 10:40 AM
unfortunately not, i tried both ways and the code still seems to be skipping the to_csv commands 😕
Version:       2.7.0
API version: 0.8.3 Python version: 3.10.6 Git commit: c2833339 Built: Thu, Dec 1, 2022 4:03 PM OS/Arch: darwin/x86_64 Profile: default Server type: hosted
j

Jeff Hale

12/11/2022, 1:52 PM
Thank you, Milou. My best guess is that it’s a Windows path parsing issue. If you do
<http://df.to|df.to>_csv("DATA_FULL.csv")
- so not in a subfolder - does that work?
m

Milou Van Cau

12/11/2022, 1:53 PM
Thanks jeff, tried it also and it didn't work. I'm actually using a linux computer. Might that be the issue ?
j

Jeff Hale

12/11/2022, 2:09 PM
lol. Sorry, too early in the AM. Linux should be fine. This is strange. Can you share your flow code, including the function call?
m

Milou Van Cau

12/11/2022, 2:10 PM
Yep
from prefect import flow, get_run_logger
from main import update_funda
import numpy as np

@flow
def main_flow():
    results = update_funda()
    logger = get_run_logger()
    <http://logger.info|logger.info>(f'THE RESULTS >> Fit time = {np.sum(results["fit_time"])}    Test score = {np.mean(results["test_score"])}')

if __name__ == "__main__":
    main_flow()
this function calls this one
def update_funda():
    df = start_scraping_funda()
    print(f"scraper {df.shape}")
    df = indexer_funda(df)
    print(f"indexer {df.shape}  ✅ Saved NL_DATA_FULL")
    df = cleaner_funda(df)
    print(f"cleaner {df.shape}  ✅ Saved NL_DATA_FULL_cleaned")
    df = encoder_funda(df)
    print(f"encoder {df.shape}")
    print("Running model")
    results = model_funda(df)
    print("run checker")
    df = checker_funda()
    df.to_csv("csv_files/funda_csv/DATA_FULL.csv")
    return results
the whole code works fine but at the end, the to_csv function is not executed and nothing is saved
When the main_flow is runned from VScode, everything is working fine. The problem only appears when i run it from the orion interface.
j

Jeff Hale

12/11/2022, 2:48 PM
Thank you.
When the main_flow is runned from VScode, everything is working fine.
Are you running it from VScode like this:
python main_flow
? And by “from the orion interface” - do you mean running a deployment from the UI? I can’t reproduce with
python my_file.py.
My attempt:
from prefect import flow
import pandas as pd


def do_it():
    df = pd.DataFrame({"a": [1, 2, 3]})
    return df


def make_csv():
    df = do_it()
    df.to_csv("sub/DATA_FULL.csv")


@flow
def test():
    make_csv()


if __name__ == "__main__":
    test()
m

Milou Van Cau

12/11/2022, 3:23 PM
I just ran the code with the run python file button of VScode
I'll try in an houre with the command you gave and let you know
I just executed the code with Python -c 'from main import update_funda; update_funda()' And it works also. The issue really only happen when i run it with the UI interface 😢
i just tried also on another computer, a macbook with the following setup:
from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("DATA_FULL.csv")


if __name__ == "__main__":
    test()
and the same result happened.
j

Jeff Hale

12/11/2022, 5:21 PM
Ok. So it works with Prefect from the cli, that's good. Running from the UI is something I always do after making a deployment - just running the deployment. I can investigate more tomorrow.
m

Milou Van Cau

12/11/2022, 5:23 PM
Perfect, thanks for the help ! let me know if you need anything else.
j

Jeff Hale

12/11/2022, 5:40 PM
I feel like initiating the flow from the UI outside a deployment and writing to a file system just might not work. I would just put it in a deployment.
Or just run from the cli if you don't plan to make it a repeatable, scheduleable type thing.
m

Milou Van Cau

12/11/2022, 5:42 PM
That's the issue, i used prefect in order to automate the running every 3hours. I am already putting it into production
j

Jeff Hale

12/11/2022, 5:43 PM
Ok. Can you run the deployment? I think try the absolute path as part of it, too
m

Milou Van Cau

12/11/2022, 5:48 PM
I tried with the absolute path as well as the short part in deployment and none work. The issue is really only happening when i use the UI interface
j

Jeff Hale

12/11/2022, 6:06 PM
If you run the deployment with
deployment run
from the cli it doesn't work, does it?
What does you deployment build command or Python file look like?
m

Milou Van Cau

12/11/2022, 6:17 PM
###
### A complete description of a Prefect Deployment for flow 'main-flow'
###
name: Funda
description: null
version: a7811d689d2cc02c955ccce13dad9b22
# The work queue that will handle this deployment's runs
work_queue_name: default
tags:
- emile
parameters: {}
schedule: null
infra_overrides: {}
infrastructure:
  type: process
  env: {}
  labels: {}
  name: null
  command: null
  stream_output: true
  working_dir: null
  block_type_slug: process
  _block_type_slug: process

###
### DO NOT EDIT BELOW THIS LINE
###
flow_name: main-flow
manifest_path: null
storage: null
path: /home/jeroen/code/Jstach22/Seren-IT
entrypoint: flow.py:main_flow
parameter_openapi_schema:
  title: Parameters
  type: object
  properties: {}
  required: null
  definitions: null
this is how my build looks like (.yaml to be precise)
i executed the
prefect deployment run
the script runs but the csv is still not saved indeed
j

Jeff Hale

12/11/2022, 6:31 PM
Did you recreate the deployment and apply it when you changed to the absolute path?
m

Milou Van Cau

12/11/2022, 6:33 PM
yep, tried the two ways and none worked
j

Jeff Hale

12/12/2022, 3:35 PM
Hi Milou, The path looks good. Did you also apply the deployment?
m

Milou Van Cau

12/12/2022, 3:35 PM
yes, both
r

Ryan Peden

12/12/2022, 3:48 PM
Hi Milou! When you run a flow as part of a deployment, it runs in a temporary directory that gets deleted after the flow run. I know you mentioned you tried an absolute path, but just to confirm, what happens if you change your code to:
from pathlib import Path
from prefect import flow
import pandas as pd

@flow
def test():
    df = pd.DataFrame({"a": [1, 2, 3]})
    df.to_csv("/home/jeroen/code/Jstach22/Seren-IT/DATA_FULL.csv")

if __name__ == "__main__":
    test()
m

Milou Van Cau

12/12/2022, 5:00 PM
Hey ! it finally works !! i tried using the complete path put manually and not with pathlib and it worked on the Linux. still doesn’t work this way on the Macbook but i’ll continue trying. Thanks a lot for the help !
🙌 1