karteek
02/16/2023, 4:05 PMnicholasnet
02/16/2023, 5:54 PMAaron Gonzalez
02/16/2023, 10:53 PMgsutil rsync <s3://some-key/dt=yyyy-mm-dd/>
<gs://some-key/dt=yyyy-mm-dd/>
😢
I am going to give prefect-shell
a try for the first time and want to know if people have had a lot of experience with it?
For my use case I have about 12K different rsync's I am going to need to run and I don't know which of these patterns is preferable:
for src in s3_sources_12k:
dest = f'<gs://some-dest/{src}>'
ShellOperation(
commands=[f"gsutil rsync -r {src} {dest}"],
env=env_var_map,
).run()
or
with ShellOperation(
commands=[
"gsutil rsync -r src1 dest1",
"gsutil rsync -r src2 dest2",
"gsutil rsync -r src3 dest3",
...
"gsutil rsync -r src12k dest12k",
],
env=env_var_map,
) as shell_operation:
shell_process = shell_operation.trigger()
shell_process.wait_for_completion()
shell_output = shell_process.fetch_result()
Thet Naing
02/17/2023, 7:45 PMnull
for all parameters, even when there are defaults set and when we choose Customize Run
to input parameters. Is this a known issue?
This seems to have begun when the latest release of Prefect was pushed, about 3 hours ago.Chris Whatley
02/17/2023, 9:27 PMThet Naing
02/20/2023, 3:38 PMlogging.yml
set with PREFECT_LOGGING_SETTINGS_PATH
.
Does anyone have examples of how this is done for cloud deployments?Adam Gold
02/20/2023, 3:58 PMprefect agent start --pool "$PREFECT_ENV" --work-queue default
.
1. It takes really long for the flow to be submitted. Notice the time here is more than 30 seconds for the task to be created, before even running:
15:25:43.676 | INFO | prefect.agent - Submitting flow run '0dda37d3-87e4-46e2-9266-920e7dae9113'
15:25:44.499 | INFO | prefect.infrastructure.process - Opening process 'congenial-falcon'...
15:25:44.998 | INFO | prefect.agent - Completed submission of flow run '0dda37d3-87e4-46e2-9266-920e7dae9113'
<frozen runpy>:128: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
15:26:01.483 | INFO | Flow run 'congenial-falcon' - Downloading flow code from storage at '/app'
15:26:16.402 | INFO | Flow run 'congenial-falcon' - Created task run 'return_value-0' for task 'return_value'
2. It downloads the code for every flow, making the pod go out of memory very quickly: Pod ephemeral local storage usage exceeds the total limit
I am probably missing something here, but would love some help 🙏Aaron Gonzalez
02/20/2023, 4:07 PMYou can structure a job as a single task or as multiple, independent tasks (up to 10,000 tasks) that can be executed in parallel. Each task runs one container instance and can be configured to retry in case of failure. Each task is aware of its index, which is stored in theenvironment variable. The overall count of tasks is stored in theCLOUD_RUN_TASK_INDEX
environment variable. If you are processing data in parallel, your code is responsible for determining which task handles which subset of the data.CLOUD_RUN_TASK_COUNT
Emil Ostergaard
02/22/2023, 1:13 PMCompleted
at 2023-02-22T110744.757581+00:00.
Flow ID: x
Flow run ID: y
Flow run URL: z
State message: All states completed.
Using prefect 2.8.1Carlos Cueto
02/22/2023, 2:33 PMTushar Gupta
02/22/2023, 4:05 PMJason Vertrees
02/22/2023, 4:34 PM[{'path': ['set_task_run_states'], 'message': "'NoneType' object has no attribute 'flow_id'", 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
Please help.Stéphan Taljaard
02/22/2023, 6:08 PMAndrew Richards
02/23/2023, 4:28 PM--- Orion logging error ---
Traceback (most recent call last):
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/prefect/logging/handlers.py", line 151, in send_logs
await client.create_logs(self._pending_logs)
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/prefect/client/orion.py", line 1843, in create_logs
await <http://self._client.post|self._client.post>(f"/logs/", json=serialized_logs)
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/httpx/_client.py", line 1855, in post
extensions=extensions,
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/httpx/_client.py", line 1527, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/prefect/client/base.py", line 253, in send
response.raise_for_status()
File "/root/micromamba/envs/prefect/lib/python3.7/site-packages/httpx/_models.py", line 736, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '429 Too Many Requests' for url '<https://api.prefect.cloud/api/accounts/><redacted>/workspaces/<redacted>/logs/'
Data Ops
02/23/2023, 9:06 PMData Ops
02/23/2023, 9:07 PMjustabill
Aaron Gonzalez
02/24/2023, 4:57 PMdeployments.py
I wrote:
def deploy_factory(name:str, flow, param: dict = dict(), cron: str = '') -> Deployment:
kwargs = {
'flow': flow,
'name': name,
'infrastructure': CloudRunJob.load(f"aether-flows-cloud-run-job-{ENV}"),
'work_queue_name': f"{ENV}_aether",
'version': VERSION,
'output': f'{name}.yaml',
'skip_upload': True,
}
if param:
kwargs['parameters'] = param,
if cron:
kwargs['schedule'] = CronSchedule(cron=cron)
return Deployment.build_from_flow(**kwargs)
aether_space_metrics_1h_parquet_rsync = deploy_factory(
name=f'aether_space_metrics_1h_parquet_rsync_{ENV}',
flow=space_metrics_parquet_rsync,
param={'date': 'current', 'agg_window': '1h'},
cron=sched('0 23 * * *')
)
aether_space_metrics_1h_parquet_rsync.apply(work_queue_concurrency=20)
Vera Zabeida
02/24/2023, 6:36 PMcheck_key_is_valid_for_login
call or all the other ones that are in the docs but not actually available in the apiRyan Peden
02/24/2023, 8:27 PM--help
on the CLI itself; so, for example, prefect cloud --help
will show you all the cloud-related CLI commands.Justin
02/25/2023, 9:13 PMfrom prefect import flow
@flow(log_prints=True)
def hello():
print("whats up prefectttt!!")
if __name__ == "__main__":
hello()
What I've already done:
• Create and work in a virtual env where prefect is up to date.
• I've set the PREFECT_API_URL so it is connected to my prefect-cloud workspace (Link)
• Connected to prefect-cloud via CLI with prefect cloud login
I've looked around and have yet to find anyone with this specific situation....Maryam Veisi
02/27/2023, 4:51 PMAustin Weisgrau
02/27/2023, 7:20 PMjpuris
02/28/2023, 8:25 AMFeb 27 13:00:05 dataplatform-jp-sandbox python[105783]: httpx.HTTPStatusError: Server error '502 Bad Gateway' for url '<https://api.prefect.cloud/api/accounts/c1397d5f-b9f3-49e8-abb6-bce7d7b1412e/workspaces/32dfe242-315b-4405-b06d-8b6308d6b631/flow_runs/d3f64d7f-acef-46f4-8114-8b1b1824e2c5>'
Full trace in attachmentjpuris
02/28/2023, 4:12 PMcurl --request POST \
--url "<https://api.prefect.cloud/api/accounts/$PREFECT_ACCOUNT_ID/workspaces/$PREFECT_WORKSPACE_ID/flow_runs>" \
--header "Authorization: Bearer $PREFECT_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"flow_id": "0c6a7af0-e534-44dd-a88a-151db5c289fc",
"deployment_id": "03e816d7-38f9-4c1f-b036-efc37a1084c1"
}'
The flow just sits there, queue-less in PENDING state. How can I tell it to place the flow run in a queue for one of my agent's to pick it up and run it?
Alternatively, since the deployment is associated with a queue, why is Prefect not putting this flow there implicitly?
However, the method described by Anna G. in a blog post "How to trigger a flow run from a deployment via API call using Python requests library or from a terminal using curl?" works!
❯ curl --request POST \
--url "<https://api.prefect.cloud/api/accounts/$PREFECT_ACCOUNT_ID/workspaces/$PREFECT_WORKSPACE_ID/deployments/$PREFECT_DEPLOYMENT_ID/create_flow_run>" \
--header "Authorization: Bearer $PREFECT_API_KEY" \
--header "Content-Type: application/json" \
--data '{"name": "your_flow_run", "state": {"type": "SCHEDULED"}}'
Is the API docs wrong, and to create a flow, one must POST to /create_flow_run
and not /flow_runs
? 😕
P.S. Originally posted in #prefect-community, but now moved to #prefect-cloudChris Arderne
03/03/2023, 11:27 AMAdam Ivansky
03/06/2023, 3:13 PMAditya B
03/07/2023, 10:47 PMregistry_url
. GCR supports virtual folders, so i can have <http://gcr.io/{project}/flows|gcr.io/{project}/flows>
be the registry_url
and each flow will be appended after that to have <http://gcr.io/{project}/flows/{flow1_name}/|gcr.io/{project}/flows/{flow1_name}/>
and <http://gcr.io/{project}/flows/{flow2_name}/|gcr.io/{project}/flows/{flow2_name}/>
etc
it doesn't seem like ECR supports this suffix style naming, how do people resolve this? do you create a separate repository per flow? or is there a way to get around thatAdhavan Mathiyalagan
03/08/2023, 1:44 PMAdhavan Mathiyalagan
03/08/2023, 1:46 PM