Dilip Thiagarajan
12/29/2020, 10:02 PMCloudFlowRunner
and CloudTaskRunner
are used when running a flow using a LocalAgent
? I was expecting FlowRunner
and TaskRunner
to be used (the backend is also “server”)Jim Crist-Harif
12/29/2020, 10:04 PMCloud*
prefix is a bit of a misnomer, those classes work with Server as well and are also used during Server orchestrated flow runs.Dilip Thiagarajan
12/29/2020, 10:13 PMJim Crist-Harif
12/29/2020, 10:16 PMflow.diagnostics()
below?Dilip Thiagarajan
12/29/2020, 10:18 PM{
"config_overrides": {},
"env_vars": [
"PREFECT__FLOWS__CHECKPOINTING",
"PREFECT__CONTEXT__SECRETS__AWS_CREDENTIALS",
"PREFECT__SERVER__HOST",
"PREFECT__BACKEND"
],
"flow_information": {
"environment": false,
"result": {
"type": "LocalResult"
},
"run_config": {
"labels": true,
"type": "UniversalRun"
},
"schedule": false,
"storage": {
"_flows": {
"Mock Train with Persistence": true
},
"_labels": false,
"add_default_labels": true,
"directory": true,
"flows": {
"Mock Train with Persistence": true
},
"path": false,
"result": true,
"secrets": false,
"stored_as_script": false,
"type": "Local"
},
"task_count": 6
},
"system_information": {
"platform": "Linux-5.4.0-58-generic-x86_64-with-debian-bullseye-sid",
"prefect_backend": "server",
"prefect_version": "0.14.0",
"python_version": "3.6.9"
}
}
Jim Crist-Harif
12/29/2020, 10:23 PMDilip Thiagarajan
12/29/2020, 10:42 PM[2020-12-29 22:25:20,687] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-12-29 22:25:20,824] INFO - agent | Deploying flow run fa3ff6e6-5d91-460b-bd9f-cda45869e98b
[2020-12-29 17:25:22-0500] INFO - prefect.CloudFlowRunner | Beginning Flow run for 'Mock Train with Persistence'
[2020-12-29 17:25:22-0500] INFO - prefect | Launching data loading for task "Setup ai-core" in the background...
[2020-12-29 17:25:22-0500] INFO - prefect.CloudTaskRunner | Task 'Setup ai-core': Starting task run...
[2020-12-29 17:25:22-0500] INFO - prefect.CloudFlowRunner | Flow run RUNNING: terminal tasks are incomplete.
[2020-12-29 17:25:23-0500] INFO - prefect.Setup ai-core | Beginning dependency setup: "Setup ai-core"...
[2020-12-29 17:25:23-0500] INFO - prefect.Setup ai-core | Commit hash for ai-core setup: f7b8552705e9eba33c62b4e11d42e7806631771d
[2020-12-29 17:26:07-0500] INFO - prefect.CloudTaskRunner | Task 'Setup ai-core': Finished task run for task with final state: 'Success'
[2020-12-29 22:40:07,269] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-12-29 11:20:27-0500] INFO - prefect.CloudTaskRunner | Task 'Fetch Slides': Starting task run...
[2020-12-29 11:20:27-0500] INFO - prefect.CloudTaskRunner | Task 'Fetch Slides': Finished task run for task with final state: 'Failed'
the trace is:
<Failed: "Failed to retrieve task results: [Errno 2] No such file or directory: '/home/dilip.thiagarajan/.prefect/results/prefect-result-2020-12-29t16-05-36-905299-00-00'">
Jim Crist-Harif
12/29/2020, 10:53 PMDilip Thiagarajan
12/29/2020, 10:56 PMTraceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/aicompute/.local/lib/python3.6/site-packages/prefect/engine/cloud/task_runner.py", line 292, in load_results
File "/home/aicompute/.local/lib/python3.6/site-packages/prefect/engine/state.py", line 125, in load_result
File "/home/aicompute/.local/lib/python3.6/site-packages/prefect/engine/results/local_result.py", line 84, in read
FileNotFoundError: [Errno 2] No such file or directory: '/home/dilip.thiagarajan/.prefect/results/prefect-result-2020-12-29t16-05-36-905299-00-00'
Jim Crist-Harif
12/29/2020, 11:02 PMResult
objects for your tasks explicitly (it looks like you have) - if so, can you provide the configuration for those?Jim Crist-Harif
12/29/2020, 11:02 PMDilip Thiagarajan
12/29/2020, 11:13 PMresult=S3Result(
'paige-ai-flow-persistence-s3-dev1-use1',
location="mock-train/fetch_labels.prefect",
boto3_kwargs=boto3_kwargs
)
but I figured this shouldn’t affect anything, given that the trace shows LocalResultJim Crist-Harif
12/30/2020, 4:27 PMDaskExecutor
backed by a distributed cluster?Jim Crist-Harif
12/30/2020, 4:29 PMDaskExecutor
, then I'm afraid I'm out of ideas - I'd need a reproducible example to continue debugging further.Dilip Thiagarajan
12/30/2020, 4:31 PM[2020-12-30 11:00:25-0500] INFO - prefect.CloudFlowRunner | Flow run RUNNING: terminal tasks are incomplete.
# LARGE DELAY
[2020-12-30 16:20:11,594] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-12-30 16:20:11,757] INFO - agent | Deploying flow run 1db98dc5-9188-4c5f-a90f-3519532f5513
[2020-12-30 11:20:14-0500] INFO - prefect.CloudFlowRunner | Beginning Flow run for 'Mock Train with Persistence'
[2020-12-30 11:20:14-0500] INFO - prefect | Beginning dependency setup.
[2020-12-30 11:20:14-0500] INFO - prefect | Commit hash for ai-core setup: f7b8552705e9eba33c62b4e11d42e7806631771d
[2020-12-30 11:20:26-0500] INFO - prefect | Done setting up dependencies.
[2020-12-30 11:21:05-0500] INFO - prefect.CloudFlowRunner | Flow run SUCCESS: all reference tasks succeeded
is there a good way of debugging something like this? or is this expected behavior?Jim Crist-Harif
12/30/2020, 4:33 PMDilip Thiagarajan
12/30/2020, 4:35 PMJim Crist-Harif
12/30/2020, 4:54 PMretry_delay
set?Jim Crist-Harif
12/30/2020, 4:54 PMDilip Thiagarajan
12/30/2020, 4:58 PMRescheduled by a Lazarus process. This is attempt 1.
And this seems to happen between each level of the DAGJim Crist-Harif
12/30/2020, 5:01 PMJim Crist-Harif
12/30/2020, 5:02 PMJim Crist-Harif
12/30/2020, 5:04 PMDilip Thiagarajan
12/30/2020, 5:06 PMJim Crist-Harif
12/30/2020, 5:07 PM