juandavidlozano
05/10/2023, 11:12 PMupload_from_path
on my code you will see that I am passing the same variable path as the from_path
and the to_path
but for some reason prefect changes the structure of the to_path
variable, here is the code I have that builds the path:
@task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
"""Write DataFrame out locally as parquet file"""
Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
path = Path(f"data/{color}/{dataset_file}.parquet")
df.to_parquet(path, compression="gzip")
return path
@task
def write_gcs(path: Path) -> None:
"""Upload local parquet file to GCS"""
gcs_block = GcsBucket.load("zoom-gcs")
gcs_block.upload_from_path(from_path=path, to_path=path)
return
you can see in the second task write_gcs
both of the paths are the same variable called path
and that is just a path structure that has originally this value: 'data/yellow/yellow_tripdata_2021-01.parquet'
.
The prefect flows runs, but after it runs, in the details of the flow we can see on the first picture I am attaching it changed the text structure of the path for GCS to: 'data\\yellow\\yellow_tripdata_2021-01.parquet'
, no idea why this is happening and because of this you can see in the picture 2 that it saves the file with that weird name instead of creating the folders in GCS, any help on maybe why this is happening?Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by