Hi I have an issue when writing to a google cloud storage, when I use the `upload_from_path` on my c...
j
Hi I have an issue when writing to a google cloud storage, when I use the
upload_from_path
on my code you will see that I am passing the same variable path as the
from_path
and the
to_path
but for some reason prefect changes the structure of the
to_path
variable, here is the code I have that builds the path:
Copy code
@task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
    """Write DataFrame out locally as parquet file"""
    Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
    path = Path(f"data/{color}/{dataset_file}.parquet")
    df.to_parquet(path, compression="gzip")
    return path


@task
def write_gcs(path: Path) -> None:
    """Upload local parquet file to GCS"""
    gcs_block = GcsBucket.load("zoom-gcs")
    gcs_block.upload_from_path(from_path=path, to_path=path)
    return
you can see in the second task
write_gcs
both of the paths are the same variable called
path
and that is just a path structure that has originally this value:
'data/yellow/yellow_tripdata_2021-01.parquet'
. The prefect flows runs, but after it runs, in the details of the flow we can see on the second picture I am attaching it changed the text structure of the path for GCS to:
'data\\yellow\\yellow_tripdata_2021-01.parquet'
, no idea why this is happening and because of this you can see in the picture 1 that it saves the file with that weird name instead of creating the folders in GCS, any help on maybe why this is happening?
m
I think this is what is happening: • In your second screenshot you can see that
from_path
is a WindowsPath. You're probably running this flow on Windows? Maybe Python is smart enough to use
WindowsPath
automatically when you instantiate a
Path
? • The WindowsPath can be constructed with forward slashes. For example
Path(f"data/{color}/{dataset_file}.parquet")
• When
WindowsPath
is converted to string, it uses backwards slashes. • GCP uses forward slashes to separate the path into folders, so backwards slashes are treated as part of the filename. Here's how you can test if that's what's happening:
Copy code
from pathlib import PurePosixPath

@task
def write_gcs(path: Path) -> None:
    """Upload local parquet file to GCS"""
    gcs_block = GcsBucket.load("zoom-gcs")
    gcs_block.upload_from_path(from_path=path, to_path=PurePosixPath(path))
    return
It converts the
to_path
to forward slashes, which is what GCP uses.
Could you check that you're on the latest version of prefect-gcp? In 0.2.4 some Windows GCS path incompatibilities were fixed. If that doesn't resolve your problem, you might want to file an issue in prefect-gcp.
j
@Mathijs Miermans THANKS!, the piece of code you posted fixed the issue, again many thanks!
🙌 1