Ben Muller
02/03/2023, 2:11 AMState message: Flow run encountered an exception. MissingResult: State data is missing. Typically, this occurs when result persistence is disabled and the state has been retrieved from the API.
I have set prefect config set PREFECT_RESULTS_PERSIST_BY_DEFAULT=True
The only flows that fail are ones that have a large number of .submit
calls with the DaskTaskRunner
I don't really care if a few of the tasks have missing results, is there a way to handle this?
I already do something like this, but the errors still persist.
data = []
for chunk in chunks(query, 500):
futures = [query_api.submit(query=GetStockPrices, kwargs=kw, session=session) for kw in chunk]
raw += [f.result(raise_on_failure=False) for f in futures]
Khyaati Jindal
02/03/2023, 3:52 AMAnkit
02/03/2023, 6:51 AMtask_runner=DaskTaskRunner(cluster_class=distributed.LocalCluster, cluster_kwargs={"n_workers": 8})
When I run the same code in jupyter notebook (as normal functions), tasks are getting completed in <1 minute whereas it's taking over 10 minutes to run in prefect with above task_runner. Can someone help with what might be wrong?Joao Moniz
02/03/2023, 8:30 AMAndrei Tulbure
02/03/2023, 9:00 AMClovis
02/03/2023, 11:08 AMNils
02/03/2023, 12:46 PMSean Malone
02/03/2023, 3:07 PMprefect-snowflake
and was wondering if there is a way to
“name” the task that spawns from calling snowflake_query()
. Right now it generates a task in a format like: snowflake_query-abcd123-00
and Im wondering if this can be controlled to look like snowflake_query-<my_query_name>
?Ethienne Marcelin
02/03/2023, 4:02 PMfrom prefect import flow
@flow
def myflow(x: int):
return x ** x
I want to run this flow with a deployment so i do the following:
from prefect.deployments import Deployment, run_deployment
from test import myflow
deployment = Deployment.build_from_flow(flow=myflow, name="test", version=1)
deployment_uuid = deployment.apply()
output = run_deployment(name=f"{deployment.flow_name}/{deployment.name}", parameters={"x": 3})
Everything works fine, but the program never returns 😢 it's stuck on the "run_deployment"
When I do prefect flow-run ls
I see that my flow-run was created but is stuck in "scheduled" state.
I have the same problem if I run this code within a "if name == main" block, if I specify a work queue name, or add a schedule time in the future.
Do you have any idea ?
Thanks in advance for your precious help 💓Jean-Michel Provencher
02/03/2023, 4:38 PM"Data integrity conflict. This usually means a unique or foreign key constraint was violated. See server logs for details."
Bryan Whiting
02/03/2023, 5:42 PMclick
and doing some logic to manage the “start from X” step. What’s the “prefect” way of thinking about “okay, i’ve already run the xgboost step today, just run from simulations on.”
What’s the prefect way to say in a flow, “start from step 3 and do all downstream steps”?David Elliott
02/03/2023, 6:32 PMTask run 'xxxx' received abort during orchestration
which I note has a TODO against it RE discovering why it happens. Details in 🧵merlin
02/03/2023, 7:14 PMaaron
02/03/2023, 7:15 PMlog_output=True
. I’m not using print statements but python’s logger in the notebook. In 1.0 there was an additional papermill
logger that needed to be enabled in server but I can’t find anything related to that in 2.0. Any ideas?Nikhil Jain
02/03/2023, 7:21 PMPrefect 2.0
where runs get started but don’t finish properly and the UI says Submission timed out.
Error on UI:
Submission failed. RuntimeError: Timed out after 120.6031973361969s while watching task for status {until_status or 'STOPPED'}
AWS Cloudwatch Logs from the flow run: indicate that the task runner stops immediately after starting the run and does not execute any of the tasks in the Flow. (sharing logs in the next message in thread).Ben Wenger
02/03/2023, 7:27 PMKeith
02/03/2023, 8:12 PMCrashed
jobs recently. I am trying to do root cause analysis on it so I start by digging through the logs in Prefect Cloud. What I see is that at some point (it is random) the logs stop and nothing further is output to the UI.
Digging through the logs in Google Logs Explorer I see the same behavior, Prefect container logs stop at the same specific point in time. Inside Google Logs I am also able to see a lot of Kubernetes related logs and am starting to see a pattern but not clear how to fix it.
• Roughly 5-10 seconds after the last log this shows up:
◦ INFO 2023-02-03T19:18:11Z [resource.labels.nodeName: gk3-prefect-autopilot-cl-nap-ji2s72nv-db29cac6-hxzc] marked the node as toBeDeleted/unschedulable
• Quickly followed by:
◦ INFO 2023-02-03T19:18:11Z [resource.labels.clusterName: prefect-autopilot-cluster-1] Scale-down: removing node gk3-prefect-autopilot-cl-nap-ji2s72nv-db29cac6-hxzc, utilization: {0.5538631957906397 0.1841863664058054 0 cpu 0.5538631957906397}, pods to reschedule: adorable-axolotl-d8k8c-6dx5c
◦ INFO 2023-02-03T19:18:38Z [resource.labels.clusterName: prefect-autopilot-cluster-1] Scale-down: node gk3-prefect-autopilot-cl-nap-ji2s72nv-db29cac6-hxzc removed with drain
• GKE tries to reschedule the job but it fails with the following, which is when Prefect alerts for the Crashed
state:
◦ INFO 2023-02-03T19:18:11Z [resource.labels.podName: adorable-axolotl-d8k8c-6dx5c] deleting pod for node scale down
◦ ERROR 2023-02-03T19:18:19.215934101Z [resource.labels.containerName: prefect-job] 19:18:19.214 | INFO | prefect.engine - Engine execution of flow run '8ca83100-dcc3-46d5-91be-f342b19b45a9' aborted by orchestrator: This run cannot transition to the RUNNING state from the RUNNING state.
This appears to be happening on jobs randomly and leads me to believe that GKE believes the cluster is overprovisioned so it is trying to reduce the cluster size and move jobs around, but jobs can't be moved in the middle of execution and Crash/Fail. I am also curious if this is due to resource sizing, but I am not seeing any issues with the jobs I have been troubleshooting with insufficient resource
problems. They all typically state the following in the containerStatuses
leaf of the JSON element with the following message:
state: {
terminated: {
containerID: "<containerd://aac705>"
exitCode: 143
finishedAt: "2023-02-03T19:18:19Z"
reason: "Error"
startedAt: "2023-02-03T19:16:52Z"
}}
Any incite would be greatly appreciated!Tomás Emilio Silva Ebensperger
02/03/2023, 8:20 PMThis run didn't generate Logs
I ran this flow on a server, the flow runs successfully, it is caught by the UI, and the flow and tasks are showing up in the cloud, but i get the weird message that the flow run didn't produce any logs. Any thoughts?LI LIU
02/03/2023, 9:34 PMCarlos Paiva
02/04/2023, 9:56 AMCarlos Paiva
02/04/2023, 9:56 AMCarlos Paiva
02/04/2023, 9:56 AMQ
02/04/2023, 10:20 AMsqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate column name: has_data
(see thread). Seems like the culpable migration is in a commit merged last week.
Downgraded to 2.7.10, got alembic.util.exc.CommandError: Can't locate revision identified by 'f92143d30c24'
.
Managed to solve by running prefect orion database downgrade -r bb38729c471a
from 2.7.11 and downgrading to 2.7.10, worked.
Tried to reproduce by upgrading to 2.7.11 and starting the server => same exception, same solution.
Can't say if this was the original reason for crashlooping or something else left database in a weird state.
Orion is running on k8s and using sqlite.Ankit
02/04/2023, 10:52 AMjcozar
02/04/2023, 12:06 PMprefect deployment build -a
to build and apply de deployment on prefect cloud. But when I run a flow run, the agent crashes because of credentials: botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the RegisterTaskDefinition operation
. The AWS_PROFILE is configured to use the correct credentials in the agent environment, so obviously I am missing something about the workflow in prefect v2.
As you can see I am lost in prefect v2 🙂 Can you share with me some tutorial or link with best practices to work with prefect v2 in AWS?
Thank you very much!Sam Garvis
02/04/2023, 3:12 PMJames Zhang
02/04/2023, 3:27 PMsqlalchemy.exc.IntegrityError: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.ForeignKeyViolationError'>: update or delete on table "flow_run" violates foreign key constraint "fk_artifact__flow_run_id__flow_run" on table "artifact"
DETAIL: Key (id)=(fb61f0a3-f3a6-4dde-87df-c1bff00011b7) is still referenced from table "artifact".
how can i then delete those zombie flow runs without complete reset of the database?Surawut Jirasaktavee
02/04/2023, 6:22 PMJamie Blakeman
02/04/2023, 7:11 PM