Jack
01/22/2023, 8:55 PMAnthony Harris
01/31/2023, 7:10 PMtargetRef
is a:
Reference to the controller that manages the set of Pods for the autoscaler to control, for example, a Deployment or a StatefulSet. You can point aAny guidance is much appreciated!at any controller that has a Scale subresource. Typically, theVerticalPodAutoscaler
retrieves the Pod set from the controller's ScaleStatus.VerticalPodAutoscaler
Hristo Stefanov
03/09/2023, 9:23 PMMiguel Moncada
03/29/2023, 4:16 PM2.8.7
but I'm getting this error on the last step, which is to register the blocks:
executor failed running [/bin/sh -c prefect block register -m prefect_gcp]: exit code: 1
ERROR: Service 'agent' failed to build : Build failed
Have you seen this error before?Miguel Moncada
03/31/2023, 1:29 PMGcsBucket
to delete a blob, or do I need to directly make use of Google's client? 🤔
I couldn't find any after checking here.
Thanks a lot!Miguel Moncada
04/05/2023, 5:48 PMSubmission failed. AttributeError: 'Resource' object has no attribute 'jobs'
I'm using prefect_gcp
version 0.3.0
in my agent, and targetting Prefect Cloud. Has anyone seen similar behavior?Eric Ma
04/07/2023, 5:57 AMserviceAccountName
in the YAML creation, it is defaulting to a generic read-only Compute Engine service account.
Do you have any solution on how I can set the default serviceAccountName to use the same service account that is provided in GCP Credential Block?
Thank you in advance for any help here
https://cloud.google.com/run/docs/securing/service-identity#gcloud
By default, Cloud Run revisions and jobs execute as the Compute Engine default service account. The Compute Engine default service account has the Project Editor IAM role which grants read and write permissions on all resources in your Google Cloud project.
Lucas Zago
05/02/2023, 6:49 PMMatt Delacour
05/05/2023, 1:54 PMMatt Delacour
05/05/2023, 2:00 PMMatt Delacour
05/10/2023, 1:21 PMjuandavidlozano
05/10/2023, 11:12 PMupload_from_path
on my code you will see that I am passing the same variable path as the from_path
and the to_path
but for some reason prefect changes the structure of the to_path
variable, here is the code I have that builds the path:
@task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
"""Write DataFrame out locally as parquet file"""
Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
path = Path(f"data/{color}/{dataset_file}.parquet")
df.to_parquet(path, compression="gzip")
return path
@task
def write_gcs(path: Path) -> None:
"""Upload local parquet file to GCS"""
gcs_block = GcsBucket.load("zoom-gcs")
gcs_block.upload_from_path(from_path=path, to_path=path)
return
you can see in the second task write_gcs
both of the paths are the same variable called path
and that is just a path structure that has originally this value: 'data/yellow/yellow_tripdata_2021-01.parquet'
.
The prefect flows runs, but after it runs, in the details of the flow we can see on the first picture I am attaching it changed the text structure of the path for GCS to: 'data\\yellow\\yellow_tripdata_2021-01.parquet'
, no idea why this is happening and because of this you can see in the picture 2 that it saves the file with that weird name instead of creating the folders in GCS, any help on maybe why this is happening?Nelson Griffiths
05/11/2023, 3:00 PMprefect-gcp
credentials blocks.
When I try to pass my credentials in via a block I get AttributeError: 'str' object has no attribute 'keys'
Ill share the full traceback in the thread. But it seems like it is having a hard time converting my credentials block into actual credentials.Francisco
05/22/2023, 2:41 PMprefect-gcp
cloud run job, and stochastically receive the following error in some flow runs:
KeyError: "No class found for dispatch key 'cloud-run-job' in registry for type 'Block'."
With a retry then the error does not appear. After a random number of executions it appears again and after a retry we recover that execution.
Has anyone experienced something similar or know what it could be?Aaron Gonzalez
06/05/2023, 4:38 PMPREFECT_API_URL
env var to the external IP address of my vm instance (not the nicest solution but I can clean this up later) and then I can immediately push my blocks and deployments to the server! 👍
The problem comes from when I lauch a run of one of my test deployments that use a Cloud Run Job. I never get out of the "Pending" status in the Prefect UI. Pretty sure it is something with my networking and probably not an actual issue with Prefect, but I can't seem to make any progress. I've tried setting up Serverless VPC Connections to allow the Cloud Run Job to be able to connect to the vm that has the server running, but maybe I've set it up incorrectly 🤷.
Anyone else ever managed to get a cloud run job deployment to work with a vm instance hosted prefect server?Heliya Hasani
06/08/2023, 1:23 PMJack
07/28/2023, 4:52 AMEric Ma
09/15/2023, 2:48 PMSam McAlilly
10/03/2023, 3:53 PMSam McAlilly
10/03/2023, 4:03 PMEric Ma
10/09/2023, 10:35 PMprefect deployment build test.py:process_batch -n worker -a -q docker -i cloud-run-job/default
Sam McAlilly
10/18/2023, 4:05 PMValentin Cathelain
10/31/2023, 12:27 AMbigquery_load_cloud_storage
: Provided Schema does not match.
I'm tried to load a CSV, with the first row as a header. I thought there was a autodetect feature by default, but it doesn't seem to be working. So I tried to specify the schema, but it appears that I don't use the correct syntax .
in api_to_bq SchemaField("stationcode", field_type="STRING"),
NameError: name 'SchemaField' is not defined
Anyone who already used than function could share his experience ?
#### My flow looks like this :
@flow(retries=3, retry_delay_seconds=5, log_prints=True)
def api_to_bq():
extracted_data = extract_data()
file_created_path = store_data(extracted_data)
print(file_created_path)
gcp_cred = GcpCredentials.load('bq-credentials')
schema = [
SchemaField("stationcode", field_type="STRING"),
SchemaField("name", field_type="STRING"),
SchemaField("is_installed", field_type="BOOLEAN"),
SchemaField("capacity", field_type="INTEGER"),
SchemaField("numdocksavailable", field_type="INTEGER"),
SchemaField("numbikesavailable", field_type="INTEGER"),
SchemaField("mechanical", field_type="INTEGER"),
SchemaField("ebike", field_type="INTEGER"),
SchemaField("is_renting", field_type="BOOLEAN"),
SchemaField("is_returning", field_type="BOOLEAN"),
SchemaField("duedate", field_type="TIMESTAMP"),
SchemaField("lon", field_type="FLOAT64"),
SchemaField("lat", field_type="FLOAT64"),
SchemaField("nom_arrondissement_communes", field_type="STRING"),
SchemaField("code_insee_commune", field_type="STRING"),
SchemaField("created_at", field_type="TIMESTAMP")
]
result = bigquery_load_cloud_storage(
dataset=destination_dataset,
table=destination_table,
uri=f'''<gs://staging/{file_created_path}''>',
schema=schema,
gcp_credentials=gcp_cred
)
return result
if __name__ == "__main__":
api_to_bq.serve(name='velib_api_to_bigquery')
Seth Taylor
11/21/2023, 3:03 PMFileNotFoundError: [Errno 2] No such file or directory: '\\data\\data\\matches\\atp_matches_1969.parquet'
Here's the path to that file on the GCS bucket:
tennis_data_lake_tennis-analysis-405301/data/matches/atp_matches_1969.parquet
Here's my extract code:
def extract_from_gcs(tour: str, subgroup: str, year: int) -> Path:
"""Download trip data from GCS"""
gcs_path = f"/data/matches/{tour}_matches{subgroup}_{year}.parquet"
gcs_block = GcsBucket.load("tennis-bucket")
gcs_block.get_directory(from_path=gcs_path, local_path=f"../data/")
return Path(f"../data/{gcs_path}")
Where tour is atp, subgroup is ''" (a blank string), and year is 1969 so the returned path should be "data/data/matches/atp_matches_1969.parquet", right?
This is my transform code.
@task()
def transform(path: Path) -> pd.DataFrame:
"""Transform parquet to df"""
df = pd.read_parquet(path)
return df
So I guess it is not finding the local path properly?
The directory structure I have is:
-data
-matches
-src
-BQ_ETL.py (the file with the flow in question).
So shouldn't "../data/" as the local path and f'../data/{gcs_path}' as the returned path be the same thing?
I have data from years 1968 to 2023 and weirdly the current code works for 1968 and 2023, which are the first and last files in my bucket, but does not work for any of the other files in the bucket.Cameron Raynor
12/03/2023, 4:45 AMJack P
12/04/2023, 8:09 PMprefect.yaml
to private Google Cloud Artifact Registry? I have it working through GitHub actions, but was going to move it to the prefect.yaml
since I am moving my deployments to it.
Thanks 🙏Kartik Ullal
12/20/2023, 2:39 AMSean Davis
01/30/2024, 3:37 PMmetadata:
name: volumetest
annotations:
run.googleapis.com/launch-stage: BETA
spec:
template:
metadata:
annotations:
run.googleapis.com/execution-environment: gen2
spec:
taskCount: 1
template:
spec:
containers:
- image: busybox
command:
- touch
- /data/abc
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
csi:
driver: gcsfuse.run.googleapis.com
volumeAttributes:
bucketName: MY_BUCKETNAME
I keep getting an error about the csi
key not being valid when I use a custom base template in prefect. If someone has accomplished adding GCS bucket to the worker config, I'd appreciate any help. If no one has tried, I can share more details.jason baker
02/01/2024, 4:30 PMprefect.yaml
configuration to call the GCP secret manager and add env variables when starting jobs would be ideal.Mark Roberts
03/02/2024, 1:56 PM